- Aggregation >
- Map-Reduce >
- Map-Reduce and Sharded Collections
Map-Reduce and Sharded Collections¶
On this page
Map-reduce supports operations on sharded collections, both as an input
and as an output. This section describes the behaviors of
mapReduce
specific to sharded collections.
Sharded Collection as Input¶
When using sharded collection as the input for a map-reduce operation,
mongos
will automatically dispatch the map-reduce job to
each shard in parallel. There is no special option
required. mongos
will wait for jobs on all shards to
finish.
Sharded Collection as Output¶
If the out
field for mapReduce
has the sharded
value, MongoDB shards the output collection using the _id
field as
the shard key.
To output to a sharded collection:
- If the output collection does not exist, MongoDB creates and shards
the collection on the
_id
field. - For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce operation to create the initial chunks distributed among the shards.
mongos
dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During the post-processing, each shard will pull the results for its own chunks from the other shards, run the final reduce/finalize, and write locally to the output collection.
Note
- During later map-reduce jobs, MongoDB splits chunks as needed.
- Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.
In MongoDB 2.0:
mongos
retrieves the results from each shard, performs a merge sort to order the results, and proceeds to the reduce/finalize phase as needed.mongos
then writes the result to the output collection in sharded mode.- This model requires only a small amount of memory, even for large data sets.
- Shard chunks are not automatically split during insertion. This requires manual intervention until the chunks are granular and balanced.
Important
For best results, only use the sharded output options for
mapReduce
in version 2.2 or later.