Graceful failover

Graceful failover provides the time to administrators to safely fail over a node from the cluster after all in-flight operations are completed and without data loss.

With graceful failover, all in-flight operations are completed, for example, data in the process of being written to the node completes and is transferred to the replica vBuckets. The replica vBuckets are promoted to active vBuckets and the active vBuckets on the failed over node are transitioned to replica vBuckets. Because failover occurs after the replica vBuckert are synchronized with the active vBuckets, graceful failover might take more time to failover the node.

While the server node is being gracefully failed over (the in-flight operations are completing), the failover can be stopped. Subsequently, graceful failover can be restarted. When a node is failed over, it can be added back to the cluster via delta or full recovery.

Note: Hard failover immediately fails over a node from the cluster which might lead to data loss. Hard failover is typically used when the node is in a bad state. Auto-failover is a hard failover.

The following conditions must be met for graceful failover, otherwise, hard failover is implemented.

  • The server node must be healthy.
  • Each active vBucket on the failed over server node must have an equivalent replica vBucket.
  • At least one (1) replica vBucket must be available.

For example:

  • In a 7 node cluster, if a bucket is configured for one replica, only one node can be gracefully failed over.
  • In a 7 node cluster, if a bucket is configured for two replica, two nodes can be gracefully failed over.
  • In a 7 node cluster, if a bucket is configured for three replica, three nodes can be gracefully failed over.

Graceful failover does not alter the number of active vBuckets (unlike hard failover). However, the number of replica vBuckets can be reduced and they can be unevenly distributed.