- Aggregation >
- Aggregation Pipeline >
- Aggregation Pipeline Optimization
Aggregation Pipeline Optimization¶
On this page
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.
To see how the optimizer transforms a particular aggregation pipeline,
include the explain
option in the
db.collection.aggregate()
method.
Optimizations are subject to change between releases.
Projection Optimization¶
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.
Pipeline Sequence Optimization¶
$sort
+ $match
Sequence Optimization¶
When you have a sequence with $sort
followed by a
$match
, the $match
moves before the
$sort
to minimize the number of objects to sort. For
example, if the pipeline consists of the following stages:
{ $sort: { age : -1 } },
{ $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $match: { status: 'A' } },
{ $sort: { age : -1 } }
$skip
+ $limit
Sequence Optimization¶
When you have a sequence with $skip
followed by a
$limit
, the $limit
moves before the
$skip
. With the reordering, the $limit
value
increases by the $skip
amount.
For example, if the pipeline consists of the following stages:
{ $skip: 10 },
{ $limit: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $limit: 15 },
{ $skip: 10 }
This optimization allows for more opportunities for
$sort + $limit Coalescence, such as with $sort
+ $skip
+
$limit
sequences. See $sort + $limit Coalescence for details
on the coalescence and $sort + $skip + $limit Sequence for an
example.
For aggregation operations on sharded collections, this optimization reduces the results returned from each shard.
$redact
+ $match
Sequence Optimization¶
When possible, when the pipeline has the $redact
stage
immediately followed by the $match
stage, the aggregation
can sometimes add a portion of the $match
stage before the
$redact
stage. If the added $match
stage is at
the start of a pipeline, the aggregation can use an index as well as
query the collection to limit the number of documents that enter the
pipeline. See Pipeline Operators and Indexes for
more information.
For example, if the pipeline consists of the following stages:
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same $match
stage before the
$redact
stage:
{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
$project
+ $skip
or $limit
Sequence Optimization¶
New in version 3.2.
When you have a sequence with $project
followed by either
$skip
or $limit
, the $skip
or
$limit
moves before $project
. For example, if
the pipeline consists of the following stages:
{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $limit: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $sort: { age : -1 } },
{ $limit: 5 }
{ $project: { status: 1, name: 1 } },
This optimization allows for more opportunities for
$sort + $limit Coalescence, such as with $sort
+ $limit
sequences. See $sort + $limit Coalescence for details on the
coalescence.
Pipeline Coalescence Optimization¶
When possible, the optimization phase coalesces a pipeline stage into its predecessor. Generally, coalescence occurs after any sequence reordering optimization.
$sort
+ $limit
Coalescence¶
When a $sort
immediately precedes a $limit
, the
optimizer can coalesce the $limit
into the
$sort
. This allows the sort operation to only maintain the
top n
results as it progresses, where n
is the specified limit,
and MongoDB only needs to store n
items in memory
[1]. See $sort Operator and Memory for more
information.
[1] | The optimization will still apply when
allowDiskUse is true and the n items exceed the
aggregation memory limit. |
$limit
+ $limit
Coalescence¶
When a $limit
immediately follows another
$limit
, the two stages can coalesce into a single
$limit
where the limit amount is the smaller of the two
initial limit amounts. For example, a pipeline contains the following
sequence:
{ $limit: 100 },
{ $limit: 10 }
Then the second $limit
stage can coalesce into the first
$limit
stage and result in a single $limit
stage where the limit amount 10
is the minimum of the two initial
limits 100
and 10
.
{ $limit: 10 }
$skip
+ $skip
Coalescence¶
When a $skip
immediately follows another $skip
,
the two stages can coalesce into a single $skip
where the
skip amount is the sum of the two initial skip amounts. For example, a
pipeline contains the following sequence:
{ $skip: 5 },
{ $skip: 2 }
Then the second $skip
stage can coalesce into the first
$skip
stage and result in a single $skip
stage where the skip amount 7
is the sum of the two initial
limits 5
and 2
.
{ $skip: 7 }
$match
+ $match
Coalescence¶
When a $match
immediately follows another
$match
, the two stages can coalesce into a single
$match
combining the conditions with an
$and
. For example, a pipeline contains the following
sequence:
{ $match: { year: 2014 } },
{ $match: { status: "A" } }
Then the second $match
stage can coalesce into the first
$match
stage and result in a single $match
stage
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
$lookup
+ $unwind
Coalescence¶
New in version 3.2.
When a $unwind
immediately follows another
$lookup
, and the $unwind
operates on the as
field of the $lookup
, the optimizer can coalesce the
$unwind
into the $lookup
stage. This avoids
creating large intermediate documents.
For example, a pipeline contains the following sequence:
{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y"
}
},
{ $unwind: "$resultingArray"}
The optimizer can coalesce the $unwind
stage into the
$lookup
stage. If you run the aggregation with explain
option, the explain
output shows the coalesced stage:
{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y",
unwinding: { preserveNullAndEmptyArrays: false }
}
}
Examples¶
The following examples are some sequences that can take advantage of both sequence reordering and coalescence. Generally, coalescence occurs after any sequence reordering optimization.
$sort
+ $skip
+ $limit
Sequence¶
A pipeline contains a sequence of $sort
followed by a
$skip
followed by a $limit
:
{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }
First, the optimizer performs the $skip + $limit Sequence Optimization to transforms the sequence to the following:
{ $sort: { age : -1 } },
{ $limit: 15 }
{ $skip: 10 }
The $skip + $limit Sequence Optimization increases the $limit
amount with the reordering. See $skip + $limit Sequence Optimization for
details.
The reordered sequence now has $sort
immediately preceding
the $limit
, and the pipeline can coalesce the two stages to
decrease memory usage during the sort operation. See
$sort + $limit Coalescence for more information.
$limit
+ $skip
+ $limit
+ $skip
Sequence¶
A pipeline contains a sequence of alternating $limit
and
$skip
stages:
{ $limit: 100 },
{ $skip: 5 },
{ $limit: 10 },
{ $skip: 2 }
The $skip + $limit Sequence Optimization reverses the position of the {
$skip: 5 }
and { $limit: 10 }
stages and increases the limit
amount:
{ $limit: 100 },
{ $limit: 15},
{ $skip: 5 },
{ $skip: 2 }
The optimizer then coalesces the two $limit
stages into a
single $limit
stage and the two $skip
stages
into a single $skip
stage. The resulting sequence is the
following:
{ $limit: 15 },
{ $skip: 7 }
See $limit + $limit Coalescence and $skip + $skip Coalescence for details.
See also
explain
option in the
db.collection.aggregate()