- Sharding >
- Hashed Sharding
Hashed Sharding¶
New in version 2.4.
Hashed sharding uses a hashed index of a single field as the shard key to partition data across your sharded cluster.
Hashed sharding provides more even data distribution across the sharded
cluster at the cost of reducing Query Isolation. Post-hash,
documents with “close” shard key values are unlikely to be on the same
chunk or shard - the mongos
is more likely to perform
Broadcast Operations to fulfill a given ranged query.
mongos
can target queries with equality matches to a single shard.
If you shard an empty collection using a hashed shard key, MongoDB
automatically creates two empty chunks per shard, to cover the entire range
of the hashed shard key value across the cluster. You can control how many
chunks MongoDB creates with the numInitialChunks
parameter to
shardCollection
or by manually creating chunks on the empty
collection using the split
command.
Tip
MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.
Hashed Sharding Shard Key¶
The field you choose as your hashed shard key should have a good
cardinality, or large number of different values.
Hashed keys are ideal for shard keys with fields that change
monotonically like ObjectId values or
timestamps. A good example of this is the default _id
field, assuming
it only contains ObjectID values.
To shard a collection using a hashed shard key, see Deploy Sharded Cluster using Hashed Sharding.
Hashed vs Ranged Sharding¶
Given a collection using a monotonically increasing value X
as the
shard key, using ranged sharding results in a distribution of incoming
inserts similar to the following:
Since the value of X
is always increasing, the chunk with an upper bound
of maxKey
receives the majority incoming writes. This
restricts insert operations to the single shard containing this chunk, which
reduces or removes the advantage of distributed writes in a sharded cluster.
By using a hashed index on X
, the distribution of inserts is similar
to the following:
Since the data is now distributed more evenly, inserts are efficiently distributed throughout the cluster.
Shard the Collection¶
Use the sh.shardCollection()
method, specifying the full namespace
of the collection and the target hashed index
to use as the shard key.
sh.shardCollection( "database.collection", { <field> : "hashed" } )
Specify the Initial Number of Chunks¶
If you shard an empty collection using a hashed shard key, MongoDB
automatically creates and migrates empty chunks so that each shard
has two chunks. To control how many chunks MongoDB creates when
sharding the collection, use shardCollection
with the
numInitialChunks
parameter.
Warning
MongoDB hashed
indexes truncate floating point numbers to 64-bit integers
before hashing. For example, a hashed
index would store the same
value for a field that held a value of 2.3
, 2.2
, and 2.9
.
To prevent collisions, do not use a hashed
index for floating
point numbers that cannot be reliably converted to 64-bit
integers (and then back to floating point). MongoDB hashed
indexes do
not support floating point values larger than 253.