Geo Indexes

Introduction to Geo Indexes

This is an introduction to ArangoDB's geo indexes.

AQL's geographic features are described in Geo functions.

ArangoDB uses Hilbert curves to implement geo-spatial indexes. See this blog for details.

A geo-spatial index assumes that the latitude is between -90 and 90 degree and the longitude is between -180 and 180 degree. A geo index will ignore all documents which do not fulfill these requirements.

Accessing Geo Indexes from the Shell

ensures that a geo index exists collection.ensureIndex({ type: "geo", fields: [ "location" ] })

Creates a geo-spatial index on all documents using location as path to the coordinates. The value of the attribute has to be an array with at least two numeric values. The array must contain the latitude (first value) and the longitude (second value).

All documents, which do not have the attribute path or have a non-conforming value in it are excluded from the index.

A geo index is implicitly sparse, and there is no way to control its sparsity.

In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

To create a geo index on an array attribute that contains longitude first, set the geoJson attribute to true. This corresponds to the format described in RFC 7946 Position

collection.ensureIndex({ type: "geo", fields: [ "location" ], geoJson: true })

To create a geo-spatial index on all documents using latitude and longitude as separate attribute paths, two paths need to be specified in the fields array:

collection.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] })

In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

Examples

Create a geo index for an array attribute:

arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] });
arangosh> for (i = -90;  i <= 90;  i += 10) {
........>     for (j = -180; j <= 180; j += 10) {
........>         db.geo.save({ name : "Name/" + i + "/" + j, loc: [ i, j ] });
........>     }
........> }
arangosh> db.geo.count();
arangosh> db.geo.near(0, 0).limit(3).toArray();
arangosh> db.geo.near(0, 0).count();
show execution results

Create a geo index for a hash array attribute:

arangosh> db.geo2.ensureIndex({ type: "geo", fields: [ "location.latitude", "location.longitude" ] });
arangosh> for (i = -90;  i <= 90;  i += 10) {
........>     for (j = -180; j <= 180; j += 10) {
........>         db.geo2.save({ name : "Name/" + i + "/" + j, location: { latitude : i, longitude : j } });
........>     }
........> }
arangosh> db.geo2.near(0, 0).limit(3).toArray();
show execution results

Use GeoIndex with AQL SORT statement:

arangosh> db.geoSort.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] });
arangosh> for (i = -90;  i <= 90;  i += 10) {
........>     for (j = -180; j <= 180; j += 10) {
........>         db.geoSort.save({ name : "Name/" + i + "/" + j, latitude : i, longitude : j });
........>     }
........> }
arangosh> var query = "FOR doc in geoSort SORT DISTANCE(doc.latitude, doc.longitude, 0, 0) LIMIT 5 RETURN doc"
arangosh> db._explain(query, {}, {colors: false});
arangosh> db._query(query);
show execution results

Use GeoIndex with AQL FILTER statement:

arangosh> db.geoFilter.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] });
arangosh> for (i = -90;  i <= 90;  i += 10) {
........>     for (j = -180; j <= 180; j += 10) {
........>         db.geoFilter.save({ name : "Name/" + i + "/" + j, latitude : i, longitude : j });
........>     }
........> }
arangosh> var query = "FOR doc in geoFilter FILTER DISTANCE(doc.latitude, doc.longitude, 0, 0) < 2000 RETURN doc"
arangosh> db._explain(query, {}, {colors: false});
arangosh> db._query(query);
show execution results

constructs a geo index selection collection.geo(location-attribute) Looks up a geo index defined on attribute location_attribute. Returns a geo index object if an index was found. The near or within operators can then be used to execute a geo-spatial query on this particular index. This is useful for collections with multiple defined geo indexes. collection.geo(location_attribute, true) Looks up a geo index on a compound attribute location_attribute. Returns a geo index object if an index was found. The near or within operators can then be used to execute a geo-spatial query on this particular index. collection.geo(latitude_attribute, longitude_attribute) Looks up a geo index defined on the two attributes latitude_attribute and longitude-attribute. Returns a geo index object if an index was found. The near or within operators can then be used to execute a geo-spatial query on this particular index. Note: this method is not yet supported by the RocksDB storage engine. Note: the geo simple query helper function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for running geo queries is to use their AQL equivalents.

Examples

Assume you have a location stored as list in the attribute home and a destination stored in the attribute work. Then you can use the geo operator to select which geo-spatial attributes (and thus which index) to use in a near query.

arangosh> for (i = -90;  i <= 90;  i += 10) {
........>  for (j = -180;  j <= 180;  j += 10) {
........>    db.complex.save({ name : "Name/" + i + "/" + j,
........>                      home : [ i, j ],
........>                      work : [ -i, -j ] });
........>  }
........> }
........> 
arangosh> db.complex.near(0, 170).limit(5);
arangosh> db.complex.ensureIndex({ type: "geo", fields: [ "home" ] });
arangosh> db.complex.near(0, 170).limit(5).toArray();
arangosh> db.complex.geo("work").near(0, 170).limit(5);
arangosh> db.complex.ensureIndex({ type: "geo", fields: [ "work" ] });
arangosh> db.complex.geo("work").near(0, 170).limit(5).toArray();
show execution results

constructs a near query for a collection collection.near(latitude, longitude) The returned list is sorted according to the distance, with the nearest document to the coordinate (latitude, longitude) coming first. If there are near documents of equal distance, documents are chosen randomly from this set until the limit is reached. It is possible to change the limit using the limit operator. In order to use the near operator, a geo index must be defined for the collection. This index also defines which attribute holds the coordinates for the document. If you have more then one geo-spatial index, you can use the geo operator to select a particular index. Note: near does not support negative skips. // However, you can still use limit followed to skip. collection.near(latitude, longitude).limit(limit) Limits the result to limit documents instead of the default 100. Note: Unlike with multiple explicit limits, limit will raise the implicit default limit imposed by within. collection.near(latitude, longitude).distance() This will add an attribute distance to all documents returned, which contains the distance between the given point and the document in meters. collection.near(latitude, longitude).distance(name) This will add an attribute name to all documents returned, which contains the distance between the given point and the document in meters. Note: this method is not yet supported by the RocksDB storage engine. Note: the near simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for retrieving documents from a collection using the near operator is to use the AQL NEAR function in an AQL query as follows:

FOR doc IN NEAR(@@collection, @latitude, @longitude, @limit)
    RETURN doc

Examples

To get the nearest two locations:

arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] });
arangosh> for (var i = -90;  i <= 90;  i += 10) {
........>   for (var j = -180; j <= 180; j += 10) {
........>     db.geo.save({
........>        name : "Name/" + i + "/" + j,
........>        loc: [ i, j ] });
........> } }
arangosh> db.geo.near(0, 0).limit(2).toArray();
show execution results

If you need the distance as well, then you can use the distance operator:

arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] });
arangosh> for (var i = -90;  i <= 90;  i += 10) {
........>  for (var j = -180; j <= 180; j += 10) {
........>     db.geo.save({
........>         name : "Name/" + i + "/" + j,
........>         loc: [ i, j ] });
........> } }
arangosh> db.geo.near(0, 0).distance().limit(2).toArray();
show execution results

constructs a within query for a collection collection.within(latitude, longitude, radius) This will find all documents within a given radius around the coordinate (latitude, longitude). The returned array is sorted by distance, beginning with the nearest document. In order to use the within operator, a geo index must be defined for the collection. This index also defines which attribute holds the coordinates for the document. If you have more then one geo-spatial index, you can use the geo operator to select a particular index. collection.within(latitude, longitude, radius).distance() This will add an attribute _distance to all documents returned, which contains the distance between the given point and the document in meters. collection.within(latitude, longitude, radius).distance(name) This will add an attribute name to all documents returned, which contains the distance between the given point and the document in meters. Note: this method is not yet supported by the RocksDB storage engine. Note: the within simple query function is deprecated as of ArangoDB 2.6. The function may be removed in future versions of ArangoDB. The preferred way for retrieving documents from a collection using the within operator is to use the AQL WITHIN function in an AQL query as follows:

FOR doc IN WITHIN(@@collection, @latitude, @longitude, @radius, @distanceAttributeName)
    RETURN doc

Examples

To find all documents within a radius of 2000 km use:

arangosh> for (var i = -90;  i <= 90;  i += 10) {
........>  for (var j = -180; j <= 180; j += 10) {
........> db.geo.save({ name : "Name/" + i + "/" + j, loc: [ i, j ] }); } }
arangosh> db.geo.within(0, 0, 2000 * 1000).distance().toArray();
show execution results

ensures that a geo index exists collection.ensureIndex({ type: "geo", fields: [ "location" ] })

Since ArangoDB 2.5, this method is an alias for ensureGeoIndex since geo indexes are always sparse, meaning that documents that do not contain the index attributes or have non-numeric values in the index attributes will not be indexed. ensureGeoConstraint is deprecated and ensureGeoIndex should be used instead.

The index does not provide a unique option because of its limited usability. It would prevent identical coordinates from being inserted only, but even a slightly different location (like 1 inch or 1 cm off) would be unique again and not considered a duplicate, although it probably should. The desired threshold for detecting duplicates may vary for every project (including how to calculate the distance even) and needs to be implemented on the application layer as needed. You can write a Foxx service for this purpose and make use of the AQL geo functions to find nearby coordinates supported by a geo index.