- Sharding >
- Data Partitioning with Chunks >
- Split Chunks in a Sharded Cluster
Split Chunks in a Sharded Cluster¶
Normally, MongoDB splits a chunk after an insert if the chunk exceeds the maximum chunk size. However, you may want to split chunks manually if:
- you have a large amount of data in your cluster and very few chunks, as is the case after deploying a cluster using existing data.
- you expect to add a large amount of data that would initially reside
in a single chunk or shard. For example, you plan to insert a large
amount of data with shard key values between
300
and400
, but all values of your shard keys are between250
and500
are in a single chunk.
Note
New in version 2.6: MongoDB provides the mergeChunks
command
to combine contiguous chunk ranges into a single chunk. See
Merge Chunks in a Sharded Cluster for more
information.
The balancer may migrate recently split chunks to a new shard immediately if the move benefits future insertions. The balancer does not distinguish between chunks split manually and those split automatically by the system.
Warning
Be careful when splitting data in a sharded collection to create new chunks. When you shard a collection that has existing data, MongoDB automatically creates chunks to evenly distribute the collection. To split data effectively in a sharded cluster you must consider the number of documents in a chunk and the average document size to create a uniform chunk size. When chunks have irregular sizes, shards may have an equal number of chunks but have very different data sizes. Avoid creating splits that lead to a collection with differently sized chunks.
Use sh.status()
to determine the current chunk ranges across
the cluster.
To split chunks manually, use the split
command with either
fields middle
or find
. The mongo
shell provides the
helper methods sh.splitFind()
and sh.splitAt()
.
splitFind()
splits the chunk that contains the first
document returned that matches this query into two equally sized chunks.
You must specify the full namespace (i.e. “<database>.<collection>
”)
of the sharded collection to splitFind()
. The query in
splitFind()
does not need to use the shard key, though it
nearly always makes sense to do so.
Example
The following command splits the chunk that contains the value of
63109
for the zipcode
field in the people
collection of
the records
database:
sh.splitFind( "records.people", { "zipcode": "63109" } )
Use splitAt()
to split a chunk in two, using the queried
document as the lower bound in the new chunk:
Example
The following command splits the chunk that contains the value of
63109
for the zipcode
field in the people
collection of
the records
database.
sh.splitAt( "records.people", { "zipcode": "63109" } )
Note
splitAt()
does not necessarily split the chunk
into two equally sized chunks. The split occurs at the location of
the document matching the query, regardless of where that document is
in the chunk.