This document provides steps for re-configuring a
replica set when only a minority of members are accessible.
You may need to use the procedure, for example, in a
geographically distributed replica set, where no local group of
members can reach a majority. See Replica Set Elections for more
information on this situation.
This procedure lets you recover while a majority of replica set
members are down or unreachable. You connect to any surviving member and
use the force option to the rs.reconfig() method.
The force option forces a new configuration onto the member. Use this procedure only to
recover from catastrophic interruptions. Do not use force every
time you reconfigure. Also, do not use the force option in any automatic
scripts and do not use force when there is still a primary.
To force reconfiguration:
Back up a surviving member.
Connect to a surviving member and save the current configuration.
Consider the following example commands for saving the configuration:
cfg=rs.conf()printjson(cfg)
On the same member, remove the down and unreachable members of the
replica set from the members array by
setting the array equal to the surviving members alone. Consider the
following example, which uses the cfg variable created in the
previous step:
On the same member, reconfigure the set by using the
rs.reconfig() command with the force option set to
true:
rs.reconfig(cfg,{force:true})
This operation forces the secondary to use the new configuration. The
configuration is then propagated to all the surviving members listed
in the members array. The replica set then elects a new primary.
Note
When you use force:true, the version number in the replica
set configuration increases significantly, by tens or hundreds
of thousands. This is normal and designed to prevent set version
collisions if you accidentally force re-configurations on both
sides of a network partition and then the network partitioning
ends.
If the failure or partition was only temporary, shut down or
decommission the removed members as soon as possible.