Incorrect or missing data (server issue)

Data missing in query response or it’s wrong (potentially due to server issues)

Sometimes, especially between releases for development builds, it’s possible results are missing due to issues in some component of Couchbase Server. This section describes how to do some debugging to identify which components, or at least to identify which components are not at fault.

Before proceeding, it needs to be mentioned that each vbucket is physically represented by a CouchDB database (generated by couchstore component) which corresponds to exactly 1 file in the filesystem, example from a development environment using 16 vbuckets only (for example simplicity), 4 nodes and without replicas enabled:

> tree ns_server/couch/0/
ns_server/couch/0/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 0.couch.1
     ??? 1.couch.1
     ??? 2.couch.1
     ??? 3.couch.1
     ??? master.couch.1
     ??? stats.json

 1 directory, 8 files

> tree ns_server/couch/1/
ns_server/couch/1/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 4.couch.1
     ??? 5.couch.1
     ??? 6.couch.1
     ??? 7.couch.1
     ??? master.couch.1
     ??? stats.json
     ??? stats.json.old

 1 directory, 9 files

> tree ns_server/couch/2/
ns_server/couch/2/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 10.couch.1
     ??? 11.couch.1
     ??? 8.couch.1
     ??? 9.couch.1
     ??? master.couch.1
     ??? stats.json
     ??? stats.json.old

 1 directory, 9 files

> tree ns_server/couch/3/
ns_server/couch/3/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 12.couch.1
     ??? 13.couch.1
     ??? 14.couch.1
     ??? 15.couch.1
     ??? master.couch.1
     ??? stats.json
     ??? stats.json.old

 1 directory, 9 files

For this particular example, because there are no replicas enabled (ran ./cluster_connect -n 4 -r 0 ), each node only has database files for the vbuckets it’s responsible for (active vbuckets). The numeric suffix in each database filename, starts at 1 when the database file is created and it gets incremented, by 1, every time the vbucket is compacted. If replication is enabled, for example you ran ./cluster_connect -n 4 -r 1, then each node will have vbucket database files for the vbuckets it’s responsible for (active vbuckets) and for some replica vbuckets, example:

> tree ns_server/couch/0/

ns_server/couch/0/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 0.couch.1
     ??? 1.couch.1
     ??? 12.couch.1
     ??? 2.couch.1
     ??? 3.couch.1
     ??? 4.couch.1
     ??? 5.couch.1
     ??? 8.couch.1
     ??? master.couch.1
     ??? stats.json

 1 directory, 12 files

> tree ns_server/couch/1/

ns_server/couch/1/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 0.couch.1
     ??? 1.couch.1
     ??? 13.couch.1
     ??? 4.couch.1
     ??? 5.couch.1
     ??? 6.couch.1
     ??? 7.couch.1
     ??? 9.couch.1
     ??? master.couch.1
     ??? stats.json

 1 directory, 12 files

> tree ns_server/couch/2/

ns_server/couch/2/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 10.couch.1
     ??? 11.couch.1
     ??? 14.couch.1
     ??? 15.couch.1
     ??? 2.couch.1
     ??? 6.couch.1
     ??? 8.couch.1
     ??? 9.couch.1
     ??? master.couch.1
     ??? stats.json

 1 directory, 12 files

> tree ns_server/couch/3/
ns_server/couch/3/
 ???
_replicator.couch.1
 ???
_users.couch.1
 ??? default
     ??? 10.couch.1
     ??? 11.couch.1
     ??? 12.couch.1
     ??? 13.couch.1
     ??? 14.couch.1
     ??? 15.couch.1
     ??? 3.couch.1
     ??? 7.couch.1
     ??? master.couch.1
     ??? stats.json

 1 directory, 12 files

You can figure out which vbucket are active in each node, by querying the following URL:

> curl -s http://localhost:8091/pools/default/buckets |
  json_xs
 [
    {
       "quota" :
{
          "rawRAM" : 268435456,
          "ram"
: 1073741824
       },
       "localRandomKeyUri" : "/pools/default/buckets/default/localRandomKey",
       "bucketCapabilitiesVer" : "",
       "authType"
: "sasl",
       "uuid" :
  "89dd5c64504f4a9414a2d3bcf9630d15",
       "replicaNumber" : 1,
       "vBucketServerMap" : {
          "vBucketMap" : [
             [
                0,
                1
             ],
             [
                0,
                1
             ],
             [
                0,
                2
             ],
             [
                0,
                3
             ],
             [
                1,
                0
             ],
             [
                1,
                0
             ],
             [
                1,
                2
             ],
             [
                1,
                3
             ],
             [
                2,
                0
             ],
             [
                2,
                1
             ],
             [
                2,
                3
             ],
             [
                2,
                3
             ],
             [
                3,
                0
             ],
             [
                3,
                1
             ],
             [
                3,
                2
             ],
             [
                3,
                2
             ]
          ],
          "numReplicas" : 1,
          "hashAlgorithm" : "CRC",
          "serverList" : [
             "192.168.1.81:12000",
             "192.168.1.82:12002",
             "192.168.1.83:12004",
             "192.168.1.84:12006"
          ]
       },

(....)
 ]

The field to look at is named vBucketServerMap, and it contains two important sub-fields, named vBucketMap and serverList, which we use to find out which nodes are responsible for which vbuckets (active vbuckets).

Looking at these 2 fields, we can do the following active and replica vbucket to node mapping:

vbuckets 0, 1, 2 and 3 are active at node 192.168.1.81:12000, and vbuckets 4, 5, 8 and 12 are replicas at that same node
vbuckets 4, 5, 6 and 7 are active at node 192.168.1.82:12002, and vbuckets 0, 1, 9 and 13 are replicas at that same node
vbuckets 8, 9, 10 and 11 are active at node 192.168.1.83:12004, and vbuckets 2, 6, 14 and 15 are replicas at that same node
vbuckets 12, 13, 14 and 15 are active at node 192.168.1.84:12006, and vbucket 3, 7, 11 and 10

the value of vBucketMap is an array of arrays of 2 elements. Each sub-array corresponds to a vbucket, so the first one is related to vbucket 0, second one to vbucket 1, etc, and the last one to vbucket 15. Each sub-array element is an index (starting at 0) into the serverList array. First element of each sub-array tells us which node (server) has the corresponding vbucket marked as active, while the second element tells us which server has this vbucket marked as replica.

If the replication factor is greater than 1 (N > 1), then each sub-array will have N + 1 elements, where first one is always index of server/node that has that vbucket active and the remaining elements are the indexes of the servers having the first, second, third, etc replicas of that vbucket.

After knowing which vbuckets are active in each node, we can use some tools such as couch_dbinfo and couch_dbdump to analyze active vbucket database files. Before looking at those tools, lets first know what database sequence numbers are.

When a CouchDB database (remember, each corresponds to a vbucket) is created, its update_seq (update sequence number) is 0. When a document is created, updated or deleted, its current sequence number is incremented by 1. So all the following sequence of actions result in the final sequence number of 5:

Create document doc1, create document doc2, create document doc3, create document doc4, create document doc5
Create document doc1, update document doc1, update document doc1, update document doc1, delete document doc1
Create document doc1, delete document doc1, create document doc2, update document doc2, update document doc2
Create document doc1, create document doc2, create document doc3, create document doc4, update document doc2
etc…

You can see the current update_seq of a vbucket database file, amongst other information, with the couch_dbinfo command line tool, example with vbucket 0, active in the first node:

> ./install/bin/couch_dbinfo ns_server/couch/0/default/0.couch.1
 DB Info
  (ns_server/couch/0/default/0.couch.1)
    file format version: 10
    update_seq: 31250
    doc count: 31250
    deleted doc count: 0
    data size: 3.76 MB
    B-tree size: 1.66 MB
    total disk size: 5.48 MB

After updating all the documents in that vbucket database, the update_seq doubled:

> ./install/bin/couch_dbinfo ns_server/couch/0/default/0.couch.1
DB Info
 (ns_server/couch/0/default/0.couch.1)
   file format version: 10
   update_seq:00
   doc count: 31250
   deleted doc count: 0
   data size: 3.76 MB
   B-tree size: 1.75 MB
   total disk size: 10.50 MB

An important detail, if not obvious, is that with each vbucket database sequence number one and only one document ID is associated to it. At any time, there’s only one update sequence number associated with a document ID, and it’s always the most recent. We can verify this with the couch_dbdump command line tool. Take the following example, where we only have 2 documents, document with ID doc1 and document with ID doc2:

> ./install/bin/couch_dbdump ns_server/couch/0/default/0.couch.1
Doc seq: 1
     id: doc1
     rev: 1
     content_meta: 0
     cas: 130763975746, expiry: 0, flags: 0
     data: {"value": 1}
Total docs: 1

On an empty vbucket 0 database, we created document with ID doc1, which has a JSON value of {"value": 1}. This document is now associated with update sequence number 1. Next we create another document, with ID *doc2* and JSON value {"value": 2}, and the output of couch_dbdump is:

> ./install/bin/couch_dbdump ns_server/couch/0/default/0.couch.1
Doc seq: 1
     id: doc1
     rev: 1
     content_meta: 0
     cas: 130763975746, expiry: 0, flags: 0
     data: {"value": 1}
Doc seq: 2
     id: doc2
     rev: 1
     content_meta: 0
     cas: 176314689876, expiry: 0, flags: 0
     data: {"value": 2}
Total docs: 2

Document doc2 got associated to vbucket 0 database update sequence number 2. Next, we update document doc1 with a new JSON value of {"value": 1111}, and couch_dbdump tells us:

> ./install/bin/couch_dbdump ns_server/couch/0/default/0.couch.1
Doc seq: 2
     id: doc2
     rev: 1
     content_meta: 0
     cas: 176314689876, expiry: 0, flags: 0
     data: {"value": 2}
Doc seq: 3
     id: doc1
     rev: 2
     content_meta: 0
     cas: 201537725466, expiry: 0, flags: 0
     data: {"value": 1111}

Total docs: 2

So, document doc1 is now associated with update sequence number 3. Note that it’s no longer associated with sequence number 1, because the update was the most recent operation against that document (remember, only 3 operations are possible: create, update or delete). The database no longer has a record for sequence number 1 as well. After this, we update document doc2 with JSON value {"value": 2222}, and we get the following output from couch_dbdump :

> ./install/bin/couch_dbdump ns_server/couch/0/default/0.couch.1
Doc seq: 3
     id: doc1
     rev: 2
     content_meta: 0
     cas: 201537725466, expiry: 0, flags: 0
     data: {"value": 1111}
Doc seq: 4
     id: doc2
     rev: 2
     content_meta: 0
     cas: 213993873979, expiry: 0, flags:   0
     data: {"value": 2222}

Total docs: 2

Document doc2 is now associated with sequence number 4, and sequence number 2 no longer has a record in the database file. Finally we deleted document doc1, and then we get:

> ./install/bin/couch_dbdump ns_server/couch/0/default/0.couch.1
Doc seq: 4
     id: doc2
     rev: 2
     content_meta: 0
     cas: 213993873979, expiry: 0, flags: 0
     data: {"value": 2222}
Doc seq: 5
     id: doc1
     rev: 3
     content_meta: 3
     cas: 201537725467, expiry: 0, flags: 0
     doc deleted
     could not read document body: document not found

Total docs: 2

Note that document deletes don’t really delete documents from the database files, instead they flag the document has deleted and remove its JSON (or binary) value. Document doc1 is now associated with sequence number 5 and the record for its previously associated sequence number 3, is removed from the vbucket 0 database file. This tells to delete all key-value pairs previously emitted by a map function for the deleted document. Without the update sequence numbers associated with the delete operation, there is no wayu to know if these documents have been deleted.

These details of sequence numbers and document operations are what allow indexes to be updated incrementally in Couchbase Server (and Apache CouchDB as well).

In Couchbase Server, indexes store in their header (state) the last update_seq seen for each vbucket database. Put it simply, whenever an index build/update finishes, it stores in its header the last update_seq processed for each vbucket database. Vbucket databases have states too in indexes, and these states do not necessarily match the vbucket states in the server. For the goals of this wiki page, it only matters to mention that view requests with stale=false will be blocked only if the currently stored update_seq of any active vbucket in the index header is smaller than the current update_seq of the corresponding vbucket database - if this is true for at least one active vbucket, an index update is scheduled immediately (if not already running) and when it finishes it will unblock the request. Requests with stale=false will not be blocked if the update_seq of vbuckets in the index with other states (passive, cleanup, replica) are smaller than the current update_seq of the corresponding vbucket databases - the reason for this is that queries only see rows produced for documents that live in the active vbuckets.

We can see that states of vbuckets in the index, and the update_seqs in the index, by querying the following URL (example for 16 vbuckets only, for the sake of simplicity):

> curl -s 'http://localhost:9500/_set_view/default/_design/dev_test2/_info' | json_xs
{
   "unindexable_partitions" : {},
   "passive_partitions" : [],
   "compact_running" : false,
   "cleanup_partitions" : [],
   "replica_group_info" : {
      "unindexable_partitions" : {},
      "passive_partitions" : [
         4,
         5,
         8,
         12
      ],
      "compact_running" : false,
      "cleanup_partitions" : [],
      "active_partitions" : [],
      "pending_transition" : null,
      "db_set_message_queue_len" : 0,
      "out_of_sync_db_set_partitions" : false,
      "expected_partition_seqs" : {
         "8" :00,
         "4" :00,
         "12" :00,
         "5" :00
      },
      "updater_running" : false,
      "partition_seqs" : {
         "8" :00,
         "4" :00,
         "12" :00,
         "5" :00
      },
      "stats" : {
         "update_history" : [
            {
               "deleted_ids" : 0,
               "inserted_kvs" : 38382,
               "inserted_ids" : 12794,
               "deleted_kvs" : 38382,
               "cleanup_kv_count" : 0,
               "blocked_time" : 1.5e-05,
               "indexing_time" : 3.861918
            }
         ],
         "updater_cleanups" : 0,
         "compaction_history" : [
            {
               "cleanup_kv_count" : 0,
               "duration" : 1.955801
            },
            {
               "cleanup_kv_count" : 0,
               "duration" : 2.443478
            },
            {
               "cleanup_kv_count" : 0,
               "duration" : 4.956397
            },
            {
               "cleanup_kv_count" : 0,
               "duration" : 9.522231
            }
         ],
         "full_updates" : 1,
         "waiting_clients" : 0,
         "compactions" : 4,
         "cleanups" : 0,
         "partial_updates" : 0,
         "stopped_updates" : 0,
         "cleanup_history" : [],
         "cleanup_interruptions" : 0
      },
      "initial_build" : false,
      "update_seqs" : {
         "8" :00,
         "4" :00,
         "12" :00,
         "5" :00
      },
      "partition_seqs_up_to_date" : true,
      "updater_state" : "not_running",
      "data_size" : 5740951,
      "cleanup_running" : false,
      "signature" : "440b0b3ded9d68abb559d58b9fda3e0a",
      "max_number_partitions" : 16,
      "disk_size" : 5742779
   },
   "active_partitions" : [
      0,
      1,
      2,
      3
   ],
   "pending_transition" : null,
   "db_set_message_queue_len" : 0,
   "out_of_sync_db_set_partitions" : false,
   "replicas_on_transfer" : [],
   "expected_partition_seqs" : {
      "1" :00,
      "3" :00,
      "0" :00,
      "2" :00
   },
   "updater_running" : false,
   "partition_seqs" : {
      "1" :00,
      "3" :00,
      "0" :00,
      "2" :00
   },
   "stats" : {
      "update_history" : [],
      "updater_cleanups" : 0,
      "compaction_history" : [],
      "full_updates" : 0,
      "waiting_clients" : 0,
      "compactions" : 0,
      "cleanups" : 0,
      "partial_updates" : 0,
      "stopped_updates" : 0,
      "cleanup_history" : [],
      "cleanup_interruptions" : 0
   },
   "initial_build" :   false,
   "replica_partitions" : [
      4,
      5,
      8,
      12
   ],
   "update_seqs" : {
      "1" : 31250,
      "3" : 31250,
      "0" : 31250,
      "2" : 31250
   },
   "partition_seqs_up_to_date" : true,
   "updater_state" :   "not_running",
   "data_size" : 5717080,
   "cleanup_running" : false,
   "signature" :   "440b0b3ded9d68abb559d58b9fda3e0a",
   "max_number_partitions" : 16,
   "disk_size" : 5726395
}

The output gives us several fields useful to diagnose issues in the server. The field replica_group_info can be ignored for the goals of this wiki (would only be useful during a failover), the information it contains is similar to the top level information, which is the one for the main/principal index, which is the one we care about during steady state and during rebalance.

Some of the top level fields and their meaning:

active_partitions - this is a list with the ID of all the vbuckets marked as active in the index.
passive_partitions - this is a list with the ID of all vbuckets marked as passive in the index.
cleanup_partitions - this is a list with the ID of all vBuckets marked as cleanup in the index.
compact_running - true if index compaction is ongoing, false otherwise.
updater_running - true if index build/update is ongoing, false otherwise.
update_seqs - this tells us what up to which vbucket database update_seqs the index reflects data, keys are vbucket IDs and values are update_seqs. The update_seqs here are always smaller or equal then the values in partition_seqs and expected_partition_seqs. If the value of any update_seq here is smaller than the corresponding value in partition_seqs or expected_partition_seqs, than it means the index is not up to date (it’s stale), and a subsequent query with stale=false will be blocked and spawn an index update (if not already running).
partition_seqs - this tells us what are the current update_seqs for each vbucket database. If any update_seq value here is greater than the corresponding value in update_seqs, we can say the index is not up to date (it’s stale). See the description above for update_seqs.
expected_partition_seqs - this should normally tells us exactly the same as partition_seqs (see above). Index processes have an optimization where they monitor vbucket database updates and track their current update_seqs, so that when the index needs to know them, it doesn’t need to consult them from the databases (expensive, from a performance perspective). The update_seqs in this field are obtained by consulting each database file. If they don’t match the corresponding values in partition_seqs, then we can say there’s an issue in the view-engine.
unindexable_partitions - this field should be non-empty only during rebalance. Vbuckets that are in this meta state “unindexable” means that index updates will ignore these vbuckets. Transitions to and from this state are used by ns_server for consistent views during rebalance. When not in rebalance, this field should always be empty, if not, then there’s a issue somewhere. The value for this field, when non-empty, is an object whose keys are vbucket IDs and values are update_seqs.

Using the information given by this URL (remember, it’s on a per node basis), to check the vbucket states and indexed update_seqs, together with the tools couch_dbinfo and couch_dbdump (against all active vbucket database files), one can debug where (which component) a problem is. For example, it’s useful to find if it’s the indexes that are not indexing latest data/updates/processing deletes, or if the memcached/ep-engine layer is not persisting data/updates to disk or if there’s some issue in couchstore (component which writes to database files) that causes it to not write data or write incorrect data to the database file.

An example where using these tools and the information from the URL /_set_view/bucketname/_design/ddocid/_info was very important to find which component was misbehaving. In this case Tommie was able to identify that the problem was in ep-engine.