- Sharding >
- Sharding Reference >
- Troubleshoot Sharded Clusters
Troubleshoot Sharded Clusters¶
On this page
- Application Servers or
mongos
Instances Become Unavailable - A Single
mongod
Becomes Unavailable in a Shard - All Members of a Shard Become Unavailable
- A Config Server Replica Set Member Become Unavailable
- Cursor Fails Because of Stale Config Data
- Shard Keys and Cluster Availability
- Config Database String Error
- Avoid Downtime when Moving Config Servers
moveChunk commit failed
Error
This page describes common strategies for troubleshooting sharded cluster deployments.
Cursor Fails Because of Stale Config Data¶
A query returns the following warning when one or more of the
mongos
instances has not yet updated its cache of the
cluster’s metadata from the config database:
could not initialize cursor across all shards because : stale config detected
This warning should not propagate back to your application. The
warning will repeat until all the mongos
instances refresh
their caches. To force an instance to refresh its cache, run the
flushRouterConfig
command.
Shard Keys and Cluster Availability¶
The most important consideration when choosing a shard key are:
- to ensure that MongoDB will be able to distribute data evenly among shards, and
- to scale writes across the cluster, and
- to ensure that
mongos
can isolate most queries to a specificmongod
.
Furthermore:
- Each shard should be a replica set, if a specific
mongod
instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable. - If the shard key allows the
mongos
to isolate most operations to a single shard, then the failure of a single shard will only render some data unavailable. - If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable.
In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.
Config Database String Error¶
Changed in version 3.2.
Starting in MongoDB 3.2, config servers can be deployed as replica
sets. The mongos
instances for the sharded cluster must
specify the same config server replica set name but can specify
hostname and port of different members of the replica set.
Starting in 3.4, the use of the deprecated mirrored mongod
instances as config servers (SCCC) is no longer supported. Before you
can upgrade your sharded clusters to 3.4, you must convert your config
servers from SCCC to CSRS.
To convert your config servers from SCCC to CSRS, see Upgrade Config Servers to Replica Set.
With earlier versions of MongoDB sharded clusters that use the topology
of three mirrored mongod
instances for config servers,
mongos
instances in a sharded cluster must specify identical
configDB
string.
Avoid Downtime when Moving Config Servers¶
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.
moveChunk commit failed
Error¶
At the end of a chunk migration, the shard must connect to the config database to update the chunk’s record in the cluster metadata. If the shard fails to connect to the config database, MongoDB reports the following error:
ERROR: moveChunk commit failed: version is at <n>|<nn> instead of
<N>|<NN>" and "ERROR: TERMINATING"
When this happens, the primary member of the shard’s replica set then terminates to protect data consistency. If a secondary member can access the config database, data on the shard becomes accessible again after an election.
The user will need to resolve the chunk migration failure independently. If you encounter this issue, contact the MongoDB User Group or MongoDB Support to address this issue.