Durability Configuration

Global Configuration

There are global configuration values for durability, which can be adjusted by specifying the following configuration options:

default wait for sync behavior --database.wait-for-sync boolean Default wait-for-sync value. Can be overwritten when creating a new collection. The default is false.

force syncing of collection properties to disk --database.force-sync-properties boolean Force syncing of collection properties to disk after creating a collection or updating its properties. If turned off, no fsync will happen for the collection and database properties stored in parameter.json files in the file system. Turning off this option will speed up workloads that create and drop a lot of collections (e.g. test suites). The default is true.

interval for automatic, non-requested disk syncs --wal.sync-interval The interval (in milliseconds) that ArangoDB will use to automatically synchronize data in its write-ahead logs to disk. Automatic syncs will only be performed for not-yet synchronized data, and only for operations that have been executed without the waitForSync attribute.

Per-collection configuration

You can also configure the durability behavior on a per-collection basis. Use the ArangoDB shell to change these properties.

gets or sets the properties of a collection collection.properties() Returns an object containing all collection properties.

waitForSync: If true creating a document will only return after the data was synced to disk.
journalSize : The size of the journal in bytes. This option is meaningful for the MMFiles storage engine only.
isVolatile: If true then the collection data will be kept in memory only and ArangoDB will not write or sync the data to disk. This option is meaningful for the MMFiles storage engine only.
keyOptions (optional) additional options for key generation. This is a JSON array containing the following attributes (note: some of the attributes are optional):
- type: the type of the key generator used for the collection.
- allowUserKeys: if set to true, then it is allowed to supply own key values in the _key attribute of a document. If set to false, then the key generator will solely be responsible for generating keys and supplying own key values in the _key attribute of documents is considered an error.
- increment: increment value for autoincrement key generator. Not used for other key generator types.
- offset: initial offset value for autoincrement key generator. Not used for other key generator types.
indexBuckets: number of buckets into which indexes using a hash table are split. The default is 16 and this number has to be a power of 2 and less than or equal to 1024. This option is meaningful for the MMFiles storage engine only. For very large collections one should increase this to avoid long pauses when the hash table has to be initially built or resized, since buckets are resized individually and can be initially built in parallel. For example, 64 might be a sensible value for a collection with 100 000 000 documents. Currently, only the edge index respects this value, but other index types might follow in future ArangoDB versions. Changes (see below) are applied when the collection is loaded the next time. In a cluster setup, the result will also contain the following attributes:
numberOfShards: the number of shards of the collection.
shardKeys: contains the names of document attributes that are used to determine the target shard for documents.
replicationFactor: determines how many copies of each shard are kept on different DBServers. collection.properties(properties) Changes the collection properties. properties must be an object with one or more of the following attribute(s):
waitForSync: If true creating a document will only return after the data was synced to disk.
journalSize : The size of the journal in bytes. This option is meaningful for the MMFiles storage engine only.
indexBuckets : See above, changes are only applied when the collection is loaded the next time. This option is meaningful for the MMFiles storage engine only.
replicationFactor : Change the number of shard copies kept on different DBServers, valid values are integer numbers in the range of 1-10 (Cluster only) Note: it is not possible to change the journal size after the journal or datafile has been created. Changing this parameter will only effect newly created journals. Also note that you cannot lower the journal size to less then size of the largest document already stored in the collection. Note: some other collection properties, such as type, isVolatile, or keyOptions cannot be changed once the collection is created.

Examples

Read all properties

arangosh> db.example.properties();
{ 
  "doCompact" : true, 
  "journalSize" : 33554432, 
  "isSystem" : false, 
  "isVolatile" : false, 
  "waitForSync" : false, 
  "keyOptions" : { 
    "type" : "traditional", 
    "allowUserKeys" : true, 
    "lastValue" : 0 
  }, 
  "indexBuckets" : 8 
}

arangosh> db.example.properties();

show execution results

Change a property

arangosh> db.example.properties({ waitForSync : true });
{ 
  "doCompact" : true, 
  "journalSize" : 33554432, 
  "isSystem" : false, 
  "isVolatile" : false, 
  "waitForSync" : true, 
  "keyOptions" : { 
    "type" : "traditional", 
    "allowUserKeys" : true, 
    "lastValue" : 0 
  }, 
  "indexBuckets" : 8 
}

arangosh> db.example.properties({ waitForSync : true });

show execution results

Per-operation configuration

Many data-modification operations and also ArangoDB's transactions allow to specify a waitForSync attribute, which when set ensures the operation data has been synchronized to disk when the operation returns.

Disk-Usage Configuration

The amount of disk space used by ArangoDB is determined by a few configuration options.

Global Configuration

The total amount of disk storage required by ArangoDB is determined by the size of the write-ahead logfiles plus the sizes of the collection journals and datafiles.

There are the following options for configuring the number and sizes of the write-ahead logfiles:

maximum number of reserve logfiles --wal.reserve-logfiles The maximum number of reserve logfiles that ArangoDB will create in a background process. Reserve logfiles are useful in the situation when an operation needs to be written to a logfile but the reserve space in the logfile is too low for storing the operation. In this case, a new logfile needs to be created to store the operation. Creating new logfiles is normally slow, so ArangoDB will try to pre-create logfiles in a background process so there are always reserve logfiles when the active logfile gets full. The number of reserve logfiles that ArangoDB keeps in the background is configurable with this option.

maximum number of historic logfiles --wal.historic-logfiles The maximum number of historic logfiles that ArangoDB will keep after they have been garbage-collected. If no replication is used, there is no need to keep historic logfiles except for having a local changelog. In a replication setup, the number of historic logfiles affects the amount of data a slave can fetch from the master's logs. The more historic logfiles, the more historic data is available for a slave, which is useful if the connection between master and slave is unstable or slow. Not having enough historic logfiles available might lead to logfile data being deleted on the master already before a slave has fetched it.

the size of each WAL logfile --wal.logfile-size Specifies the filesize (in bytes) for each write-ahead logfile. The logfile size should be chosen so that each logfile can store a considerable amount of documents. The bigger the logfile size is chosen, the longer it will take to fill up a single logfile, which also influences the delay until the data in a logfile will be garbage-collected and written to collection journals and datafiles. It also affects how long logfile recovery will take at server start.

whether or not oversize entries are allowed --wal.allow-oversize-entries Whether or not it is allowed to store individual documents that are bigger than would fit into a single logfile. Setting the option to false will make such operations fail with an error. Setting the option to true will make such operations succeed, but with a high potential performance impact. The reason is that for each oversize operation, an individual oversize logfile needs to be created which may also block other operations. The option should be set to false if it is certain that documents will always have a size smaller than a single logfile. When data gets copied from the write-ahead logfiles into the journals or datafiles of collections, files will be created on the collection level. How big these files are is determined by the following global configuration value:

--database.maximal-journal-size size Maximal size of journal in bytes. Can be overwritten when creating a new collection. Note that this also limits the maximal size of a single document. The default is 32MB.

Per-collection configuration

The journal size can also be adjusted on a per-collection level using the collection's properties method.