Fuse Message Broker - Fault Tolerant Messaging - Shared File System Master/Slave

Shared File System Master/Slave

Overview

A shared file system master/slave group works by sharing a common data store that is located on a shared file system. Brokers automatically configure themselves to operate in master mode or slave mode based on their ability to grab an exclusive lock on the underlying data store.

The advantage of this configuration are:

that a group can consist of more than two members. I group can have an arbitrary number of slaves.
that a failed node can rejoin the group without any manual intervention. When a new node joins, or rejoins, the group it automatically falls into slave mode until it can get an exclusive lock on the data store.

The disadvantage of this configuration is that the shared file system is a single point of failure. This disadvantage can be mitigated by using a storage area network(SAN) with built in high availability(HA) functionality. The SAN will handle replication and fail over of the data store.

File locking requirements

The shared file system requires an efficient and reliable file locking mechanism to function correctly. Not all SAN file systems are compatible with the shared file system configuration's needs.

	Warning
	OCFS2 is incompatible with this master/slave configuration, because mutex file locking from Java is not supported.

Warning

NFSv3 is incompatible with this master/slave configuration. In the event of an abnormal termination of a master broker, which is an NFSv3 client, the NFSv3 server does not time out the lock held by the client. This renders the Fuse Message Broker data directory inaccessible. Because of this, the slave broker cannot acquire the lock and therefore cannot start up. In this case, the only way to unblock the master/slave in NFSv3 is to reboot all broker instances.

On the other hand, NFSv4 is compatible with this master/slave configuration, because its design includes timeouts for locks. When an NFSv4 client holding a lock terminates abnormally, the lock is automatically released after 30 seconds, allowing another NFSv4 client to grab the lock.

Initial state

Figure 3.3 shows the initial state of a shared file system master/slave group. When all of the brokers are started, one of them grabs the exclusive lock on the broker data store and becomes the master. All of the other brokers remain slaves and pause while waiting for the exclusive lock to be freed up. Only the master starts its transport connectors, so all of the clients connect to it.

Figure 3.3. Shared File System Initial State

a master and two slaves using a shared file system

State after failure of the master

Figure 3.4 shows the state of the master/slave group after the original master has shut down or failed. As soon as the master gives up the lock (or after a suitable timeout, if the master crashes), the lock on the data store frees up and another broker grabs the lock and gets promoted to master.

Figure 3.4. Shared File System after Master Failure

After the clients lose their connection to the original master, they automatically try all of the other brokers listed in the failover URL. This enables them to find and connect to the new master.

Configuring the brokers

In the shared file system master/slave configuration, there is nothing special to distinguish a master broker from the slave brokers. The membership of a particular master/slave group is defined by the fact that all of the brokers in the group use the same persistence layer and store their data in the same shared directory.

Example 3.5 shows the broker configuration for a shared file system master/slave group that shares a data store located at /sharedFileSystem/sharedBrokerData and uses the KahaDB persistence store.

Example 3.5. Shared File System Broker Configuration

<broker ... >
  ...
  <persistenceAdapter>
    <kahaDB directory="/sharedFileSystem/sharedBrokerData"/>
  </persistenceAdapter>
  ...
</broker>

All of the brokers in the group must share the same persistenceAdapter element.

Configuring the clients

Clients of shared file system master/slave group must be configured with a failover URL that lists the URLs for all of the brokers in the group. Example 3.6 shows the client failover URL for a group that consists of three brokers: broker1, broker2, and broker3.

Example 3.6. Client URL for a Shared File System Master/Slave Group

failover:(tcp://broker1:61616,tcp://broker2:61616,tcp://broker3:61616)

For more information about using the failover protocol see Static Failover.

Reintroducing a failed node

You can restart the failed master at any time and it will rejoin the cluster. It will rejoin as a slave broker because one of the other brokers already owns the exclusive lock on the data store, as shown in Figure 3.5.

Figure 3.5. Shared File System after Master Restart

a master with two slaves broker1 is now a slave

Comments powered by Disqus

Contents
Search