A shared file system master/slave group works by sharing a common data store that is located on a shared file system. Brokers automatically configure themselves to operate in master mode or slave mode based on their ability to grab an exclusive lock on the underlying data store.
The advantage of this configuration are:
that a group can consist of more than two members. I group can have an arbitrary number of slaves.
that a failed node can rejoin the group without any manual intervention. When a new node joins, or rejoins, the group it automatically falls into slave mode until it can get an exclusive lock on the data store.
The disadvantage of this configuration is that the shared file system is a single point of failure. This disadvantage can be mitigated by using a storage area network(SAN) with built in high availability(HA) functionality. The SAN will handle replication and fail over of the data store.
The shared file system requires an efficient and reliable file locking mechanism to function correctly. Not all SAN file systems are compatible with the shared file system configuration's needs.
![]() | Warning |
---|---|
OCFS2 is incompatible with this master/slave configuration, because mutex file locking from Java is not supported. |
![]() | Warning |
---|---|
NFSv3 is incompatible with this master/slave configuration. In the event of an abnormal termination of a master broker, which is an NFSv3 client, the NFSv3 server does not time out the lock held by the client. This renders the Fuse Message Broker data directory inaccessible. Because of this, the slave broker cannot acquire the lock and therefore cannot start up. In this case, the only way to unblock the master/slave in NFSv3 is to reboot all broker instances. |
On the other hand, NFSv4 is compatible with this master/slave configuration, because its design includes timeouts for locks. When an NFSv4 client holding a lock terminates abnormally, the lock is automatically released after 30 seconds, allowing another NFSv4 client to grab the lock.
Figure 3.3 shows the initial state of a shared file system master/slave group. When all of the brokers are started, one of them grabs the exclusive lock on the broker data store and becomes the master. All of the other brokers remain slaves and pause while waiting for the exclusive lock to be freed up. Only the master starts its transport connectors, so all of the clients connect to it.
Figure 3.4 shows the state of the master/slave group after the original master has shut down or failed. As soon as the master gives up the lock (or after a suitable timeout, if the master crashes), the lock on the data store frees up and another broker grabs the lock and gets promoted to master.
After the clients lose their connection to the original master, they automatically try all of the other brokers listed in the failover URL. This enables them to find and connect to the new master.
In the shared file system master/slave configuration, there is nothing special to distinguish a master broker from the slave brokers. The membership of a particular master/slave group is defined by the fact that all of the brokers in the group use the same persistence layer and store their data in the same shared directory.
Example 3.5 shows the broker configuration
for a shared file system master/slave group that shares a data store located at
/sharedFileSystem/sharedBrokerData
and uses the KahaDB persistence
store.
Example 3.5. Shared File System Broker Configuration
<broker ... > ... <persistenceAdapter> <kahaDB directory="/sharedFileSystem/sharedBrokerData"/> </persistenceAdapter> ... </broker>
All of the brokers in the group must share the same
persistenceAdapter
element.
Clients of shared file system master/slave group must be configured with a failover URL
that lists the URLs for all of the brokers in the group.
Example 3.6 shows the client failover URL for a
group that consists of three brokers: broker1
, broker2
,
and broker3
.
Example 3.6. Client URL for a Shared File System Master/Slave Group
failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)
For more information about using the failover protocol see Static Failover.
You can restart the failed master at any time and it will rejoin the cluster. It will rejoin as a slave broker because one of the other brokers already owns the exclusive lock on the data store, as shown in Figure 3.5.