- Sharding >
- Shard Keys
Shard Keys¶
On this page
The shard key determines the distribution of the collection’s documents among the cluster’s shards. The shard key is either an indexed field or indexed compound fields that exists in every document in the collection.
MongoDB partitions data in the collection using ranges of shard key values. Each range defines a non-overlapping range of shard key values and is associated with a chunk.
MongoDB attempts to distribute chunks evenly among the shards in the cluster. The shard key has a direct relationship to the effectiveness of chunk distribution. See Choosing a Shard Key.
Important
Once you shard a collection, the shard key and the shard key values are immutable; i.e.
- You cannot select a different shard key for that collection.
- You cannot update the values of the shard key fields.
Shard Key Specification¶
To shard a collection, you must specify the target collection and the
shard key to the sh.shardCollection()
method:
sh.shardCollection( namespace, key )
- The
namespace
parameter consists of a string<database>.<collection>
specifying the full namespace of the target collection. - The
key
parameter consists of a document containing a field and the index traversal direction for that field.
For instructions specific to sharding a collection using the hashed sharding strategy, see Shard a Collection using Hashed Sharding
For instructions specific to sharding a collection using the ranged sharding strategy, see Shard a Collection using Ranged Sharding.
Shard Key Indexes¶
All sharded collections must have an index that supports the shard key; i.e. the index can be an index on the shard key or a compound index where the shard key is a prefix of the index.
- If the collection is empty,
sh.shardCollection()
creates the index on the shard key if such an index does not already exists. - If the collection is not empty, you must create the index first
before using
sh.shardCollection()
.
If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.
Unique Indexes¶
For a sharded collection, only the _id
field index and the index on
the shard key or a compound index where the shard key is a
prefix can be unique:
- You cannot shard a collection that has unique indexes on other fields.
- You cannot create unique indexes on other fields for a sharded collection.
Through the use of the unique index on the shard key, MongoDB can
enforce uniqueness on the shard key values. MongoDB enforces uniqueness
on the entire key combination, and not individual components of the
shard key. To enforce uniqueness on the shard key values, pass the
unique
parameter as true
to the sh.shardCollection()
method:
- If the collection is empty,
sh.shardCollection()
creates the unique index on the shard key if such an index does not already exists. - If the collection is not empty, you must create the index first
before using
sh.shardCollection()
.
Although you can have a unique compound index where the shard
key is a prefix, if using unique
parameter, the collection must have a unique index that is on the shard
key.
You cannot specify a unique constraint on a hashed index.
Choosing a Shard Key¶
The choice of shard key affects how the sharded cluster balancer creates and distributes chunks across the available shards. This affects the overall efficiency and performance of operations within the sharded cluster.
The shard key affects the performance and efficiency of the sharding strategy used by the sharded cluster.
The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster.
At minimum, consider the consequences of the cardinality, frequency, and rate of change of a potential shard key.
Restrictions¶
For restrictions on shard key, see Shard Key Limitations.
Collection Size¶
When sharding a collection that is not empty, the shard key can
constrain the maximum supported collection size for the initial
sharding operation only. See Sharding Existing Collection Data
Size
.
Important
A sharded collection can grow to any size after successful sharding.
Shard Key Cardinality¶
The cardinality of a shard key determines the maximum number of chunks the balancer can create. This can reduce or remove the effectiveness of horizontal scaling in the cluster.
A unique shard key value can exist on no more than a single chunk at any
given time. If a shard key has a cardinality of 4
, then there can
be no more than 4
chunks within the sharded cluster, each storing
one unique shard key value. This constrains the number of effective
shards in the cluster to 4
as well - adding additional shards would
not provide any benefit.
The following image illustrates a sharded cluster using the field
X
as the shard key. If X
has low cardinality, the distribution of
inserts may look similar to the following:
The cluster in this example would not scale horizontally, as incoming writes would only route to a subset of shards.
A shard key with high cardinality does not guarantee even distribution of data across the sharded cluster, though it does better facilitate horizontal scaling. The frequency and rate of change of the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has low cardinality, consider using a compound index using a field that has higher relative cardinality.
Shard Key Frequency¶
Consider a set representing the range of shard key values - the frequency
of the shard key represents how often a given value occurs in the data. If the
majority of documents contain only a subset of those values, then the chunks
storing those documents become a bottleneck within the cluster. Furthermore,
as those chunks grow, they may become indivisible chunks
as they cannot be split any further. This reduces or removes the effectiveness
of horizontal scaling within the cluster.
The following image illustrates a sharded cluster using the field X
as the
shard key. If a subset of values for X
occur with high frequency, the
distribution of inserts may look similar to the following:
A shard key with low frequency does not guarantee even distribution of data across the sharded cluster. The cardinality and rate of change of the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that has high frequency values, consider using a compound index using a unique or low frequency value.
Monotonically Changing Shard Keys¶
A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single shard within the cluster.
This occurs because every cluster has a chunk that captures a range with an
upper bound of maxKey
. maxKey
always
compares as higher than all other values. Similarly, there is a chunk that
captures a range with a lower bound of minKey
.
minKey
always compares as lower than all other values.
If the shard key value is always increasing, all new inserts are routed to the
chunk with maxKey
as the upper bound. If the shard key value is always
decreasing, all new inserts are routed to the chunk with minKey
as the
lower bound. The shard containing that chunk becomes the bottleneck for write
operations.
The following image illustrates a sharded cluster using the field X
as the shard key. If the values for X
are monotonically increasing, the
distribution of inserts may look similar to the following:
If the shard key value was monotonically decreasing, then all inserts would
route to Chunk A
instead.
A shard key that does not change monotonically does not guarantee even distribution of data across the sharded cluster. The cardinality and frequency of the shard key also contributes to data distribution. Consider each factor when choosing a shard key.
If your data model requires sharding on a key that changes monotonically, consider using Hashed Sharding.