Navigation

Multikey Index Bounds

The bounds of an index scan define the portions of an index to search during a query. When multiple predicates over an index exist, MongoDB will attempt to combine the bounds for these predicates, by either intersection or compounding, in order to produce a scan with smaller bounds.

Intersect Bounds for Multikey Index

Bounds intersection refers to a logical conjunction (i.e. AND) of multiple bounds. For instance, given two bounds [ [ 3, Infinity ] ] and [ [ -Infinity, 6 ] ], the intersection of the bounds results in [ [ 3, 6 ] ].

Given an indexed array field, consider a query that specifies multiple predicates on the array and can use a multikey index. MongoDB can intersect multikey index bounds if an $elemMatch joins the predicates.

For example, a collection survey contains documents with a field item and an array field ratings:

{ _id: 1, item: "ABC", ratings: [ 2, 9 ] }
{ _id: 2, item: "XYZ", ratings: [ 4, 3 ] }

Create a multikey index on the ratings array:

db.survey.createIndex( { ratings: 1 } )

The following query uses $elemMatch to require that the array contains at least one single element that matches both conditions:

db.survey.find( { ratings : { $elemMatch: { $gte: 3, $lte: 6 } } } )

Taking the predicates separately:

  • the bounds for the greater than or equal to 3 predicate (i.e. $gte: 3) are [ [ 3, Infinity ] ];
  • the bounds for the less than or equal to 6 predicate (i.e. $lte: 6) are [ [ -Infinity, 6 ] ].

Because the query uses $elemMatch to join these predicates, MongoDB can intersect the bounds to:

ratings: [ [ 3, 6 ] ]

If the query does not join the conditions on the array field with $elemMatch, MongoDB cannot intersect the multikey index bounds. Consider the following query:

db.survey.find( { ratings : { $gte: 3, $lte: 6 } } )

The query searches the ratings array for at least one element greater than or equal to 3 and at least one element less than or equal to 6. Because a single element does not need to meet both criteria, MongoDB does not intersect the bounds and uses either [ [ 3, Infinity ] ] or [ [ -Infinity, 6 ] ]. MongoDB makes no guarantee as to which of these two bounds it chooses.

Compound Bounds for Multikey Index

Compounding bounds refers to using bounds for multiple keys of compound index. For instance, given a compound index { a: 1, b: 1 } with bounds on field a of [ [ 3, Infinity ] ] and bounds on field b of [ [ -Infinity, 6 ] ], compounding the bounds results in the use of both bounds:

{ a: [ [ 3, Infinity ] ], b: [ [ -Infinity, 6 ] ] }

If MongoDB cannot compound the two bounds, MongoDB always constrains the index scan by the bound on its leading field, in this case, a: [ [ 3, Infinity ] ].

Compound Index on an Array Field

Consider a compound multikey index; i.e. a compound index where one of the indexed fields is an array. For example, a collection survey contains documents with a field item and an array field ratings:

{ _id: 1, item: "ABC", ratings: [ 2, 9 ] }
{ _id: 2, item: "XYZ", ratings: [ 4, 3 ] }

Create a compound index on the item field and the ratings field:

db.survey.createIndex( { item: 1, ratings: 1 } )

The following query specifies a condition on both keys of the index:

db.survey.find( { item: "XYZ", ratings: { $gte: 3 } } )

Taking the predicates separately:

  • the bounds for the item: "XYZ" predicate are [ [ "XYZ", "XYZ" ] ];
  • the bounds for the ratings: { $gte: 3 } predicate are [ [ 3, Infinity ] ].

MongoDB can compound the two bounds to use the combined bounds of:

{ item: [ [ "XYZ", "XYZ" ] ], ratings: [ [ 3, Infinity ] ] }

Range Queries on a Scalar Indexed Field (WiredTiger)

Changed in version 3.4: For the WiredTiger and In-Memory storage engines only,

Starting in MongoDB 3.4, for multikey indexes created using MongoDB 3.4 or later, MongoDB keeps track of which indexed field or fields cause an index to be a multikey index. Tracking this information allows the MongoDB query engine to use tighter index bounds.

The aforementioned compound index is on the scalar field [1] item and the array field ratings:

db.survey.createIndex( { item: 1, ratings: 1 } )

For the WiredTiger and the In-Memory storage engines, if a query operation specifies multiple predicates on the indexed scalar field(s) of a compound multikey index created in MongoDB 3.4 or later, MongoDB will intersect the bounds for the field.

For example, the following operation specifies a range query on the scalar field as well as a range query on the array field:

db.survey.find( {
   item: { $gte: "L", $lte: "Z"}, ratings : { $elemMatch: { $gte: 3, $lte: 6 } }
} )

MongoDB will intersect the bounds for item to [ [ "L", "Z" ] ] and ratings to [[3.0, 6.0]] to use the combined bounds of:

"item" : [ [ "L", "Z" ] ], "ratings" : [ [3.0, 6.0] ]

For another example, consider where the scalar fields belong to a nested document. For instance, a collection survey contains the following documents:

{ _id: 1, item: { name: "ABC", manufactured: 2016 }, ratings: [ 2, 9 ] }
{ _id: 2, item: { name: "XYZ", manufactured: 2013 },  ratings: [ 4, 3 ] }

Create a compound multikey index on the scalar fields "item.name", "item.manufactured", and the array field ratings :

db.survey.createIndex( { "item.name": 1, "item.manufactured": 1, ratings: 1 } )

Consider the following operation that specifies query predicates on the scalar fields:

db.survey.find( {
   "item.name": "L" ,
   "item.manufactured": 2012
} )

For this query, MongoDB can use the combined bounds of:

"item.name" : [ ["L", "L"] ], "item.manufactured" : [ [2012.0, 2012.0] ]

Earlier versions of MongoDB cannot combine these bounds for the scalar fields.

[1]

A scalar field is a field whose value is neither a document nor an array; e.g. a field whose value is a string or an integer is a scalar field.

A scalar field can be a field nested in a document, as long as the field itself is not an array or a document. For example, in the document { a: { b: { c: 5, d: 5 } } }, c and d are scalar fields where as a and b are not.

Range Queries on the Scalar Indexed Field (MMAPv1)

For the MMAPv1 storage engine, MongoDB cannot combine bounds for the scalar field for a compound multikey index, even if the query is only on the scalar field.

Compound Index on Fields from an Array of Embedded Documents

If an array contains embedded documents, to index on fields contained in the embedded documents, use the dotted field name in the index specification. For instance, given the following array of embedded documents:

ratings: [ { score: 2, by: "mn" }, { score: 9, by: "anon" } ]

The dotted field name for the score field is "ratings.score".

Compound Bounds of Non-array Field and Field from an Array

Consider a collection survey2 contains documents with a field item and an array field ratings:

{
  _id: 1,
  item: "ABC",
  ratings: [ { score: 2, by: "mn" }, { score: 9, by: "anon" } ]
}
{
  _id: 2,
  item: "XYZ",
  ratings: [ { score: 5, by: "anon" }, { score: 7, by: "wv" } ]
}

Create a compound index on the non-array field item as well as two fields from an array ratings.score and ratings.by:

db.survey2.createIndex( { "item": 1, "ratings.score": 1, "ratings.by": 1 } )

The following query specifies a condition on all three fields:

db.survey2.find( { item: "XYZ",  "ratings.score": { $lte: 5 }, "ratings.by": "anon" } )

Taking the predicates separately:

  • the bounds for the item: "XYZ" predicate are [ [ "XYZ", "XYZ" ] ];
  • the bounds for the score: { $lte: 5 } predicate are [ [ -Infinity, 5 ] ];
  • the bounds for the by: "anon" predicate are [ "anon", "anon" ].

MongoDB can compound the bounds for the item key with either the bounds for "ratings.score" or the bounds for "ratings.by", depending upon the query predicates and the index key values. MongoDB makes no guarantee as to which bounds it compounds with the item field. For instance, MongoDB will either choose to compound the item bounds with the "ratings.score" bounds:

{
  "item" : [ [ "XYZ", "XYZ" ] ],
  "ratings.score" : [ [ -Infinity, 5 ] ],
  "ratings.by" : [ [ MinKey, MaxKey ] ]
}

Or, MongoDB may choose to compound the item bounds with "ratings.by" bounds:

{
  "item" : [ [ "XYZ", "XYZ" ] ],
  "ratings.score" : [ [ MinKey, MaxKey ] ],
  "ratings.by" : [ [ "anon", "anon" ] ]
}

However, to compound the bounds for "ratings.score" with the bounds for "ratings.by", the query must use $elemMatch. See Compound Bounds of Index Fields from an Array for more information.

Compound Bounds of Index Fields from an Array

To compound together the bounds for index keys from the same array:

  • the index keys must share the same field path up to but excluding the field names, and
  • the query must specify predicates on the fields using $elemMatch on that path.

For a field in an embedded document, the dotted field name, such as "a.b.c.d", is the field path for d. To compound the bounds for index keys from the same array, the $elemMatch must be on the path up to but excluding the field name itself; i.e. "a.b.c".

For instance, create a compound index on the ratings.score and the ratings.by fields:

db.survey2.createIndex( { "ratings.score": 1, "ratings.by": 1 } )

The fields "ratings.score" and "ratings.by" share the field path ratings. The following query uses $elemMatch on the field ratings to require that the array contains at least one single element that matches both conditions:

db.survey2.find( { ratings: { $elemMatch: { score: { $lte: 5 }, by: "anon" } } } )

Taking the predicates separately:

  • the bounds for the score: { $lte: 5 } predicate is [ -Infinity, 5 ];
  • the bounds for the by: "anon" predicate is [ "anon", "anon" ].

MongoDB can compound the two bounds to use the combined bounds of:

{ "ratings.score" : [ [ -Infinity, 5 ] ], "ratings.by" : [ [ "anon", "anon" ] ] }

Query Without $elemMatch

If the query does not join the conditions on the indexed array fields with $elemMatch, MongoDB cannot compound their bounds. Consider the following query:

db.survey2.find( { "ratings.score": { $lte: 5 }, "ratings.by": "anon" } )

Because a single embedded document in the array does not need to meet both criteria, MongoDB does not compound the bounds. When using a compound index, if MongoDB cannot constrain all the fields of the index, MongoDB always constrains the leading field of the index, in this case "ratings.score":

{
  "ratings.score": [ [ -Infinity, 5 ] ],
  "ratings.by": [ [ MinKey, MaxKey ] ]
}

$elemMatch on Incomplete Path

If the query does not specify $elemMatch on the path of the embedded fields, up to but excluding the field names, MongoDB cannot compound the bounds of index keys from the same array.

For example, a collection survey3 contains documents with a field item and an array field ratings:

{
  _id: 1,
  item: "ABC",
  ratings: [ { scores: [ { q1: 2, q2: 4 }, { q1: 3, q2: 8 } ], loc: "A" },
             { scores: [ { q1: 2, q2: 5 } ], loc: "B" } ]
}
{
  _id: 2,
  item: "XYZ",
  ratings: [ { scores: [ { q1: 7 }, { q1: 2, q2: 8 } ], loc: "B" } ]
}

Create a compound index on the ratings.scores.q1 and the ratings.scores.q2 fields:

db.survey3.createIndex( { "ratings.scores.q1": 1, "ratings.scores.q2": 1 } )

The fields "ratings.scores.q1" and "ratings.scores.q2" share the field path "ratings.scores" and the $elemMatch must be on that path.

The following query, however, uses an $elemMatch but not on the required path:

db.survey3.find( { ratings: { $elemMatch: { 'scores.q1': 2, 'scores.q2': 8 } } } )

As such, MongoDB cannot compound the bounds, and the "ratings.scores.q2" field will be unconstrained during the index scan. To compound the bounds, the query must use $elemMatch on the path "ratings.scores":

db.survey3.find( { 'ratings.scores': { $elemMatch: { 'q1': 2, 'q2': 8 } } } )