vBuckets

A vBucket is defined as the owner of a subset of the key space of a Couchbase cluster. These vBuckets are used to distributed information effectively across a cluster.

The vBucket system is used both for distributing data and for supporting replicas (copies of bucket data) on more than one node. vBuckets are not a user-accessible component, but they are a critical component of Couchbase Server and are vital to the availability support and elastic nature.

Clients access the information stored in a bucket by communicating directly with the node responsible for the corresponding vBucket. This direct access enables clients to communicate with the node storing the data, rather than using a proxy or redistribution architecture. The result abstracts the physical topology from the logical partitioning of data. This architecture gives Couchbase Server elasticity and flexibility

Every document ID belongs to a vBucket. A mapping function is used to calculate the vBucket in which a given document belongs. In Couchbase Server, that mapping function is a hashing function that takes a document ID as input and outputs a vBucket identifier. Once the vBucket identifier has been computed, a table is consulted to lookup the server that “hosts” that vBucket. The table contains one row per vBucket, pairing the vBucket to its hosting server. A server appearing in this table can be (and usually is) responsible for multiple vBuckets.

The following diagrams shows how the Key to Server mapping (vBucket map) works.

In this scenario, there are three servers in the cluster and client wants to look up the value of KEY using the GET operation.

  1. The client first hashes the key to calculate the vBucket which owns KEY. In this example, the hash resolves to vBucket 8 (vB8).
  2. By examining the vBucket map, the client determines Server C hosts vB8.
  3. The client sends the GET operation directly to Server C.

In the next scenario, a server added to the original cluster of three. A new node, Server D, is added to the cluster and the vBucket Map is updated (during the rebalance operation). The updated map is then sent to all the cluster participants including the other nodes, any connected “smart” clients, and the Moxi proxy service.

Within the new four-node cluster model, when a client again wants to determine the value of KEY using the GET operation:

  • The hashing algorithm still resolves to vBucket 8 (vB8).
  • The new vBucket Map now maps vBucket 8 to Server D.
  • The client sends the GET operation directly to Server D.
Note: This architecture permits Couchbase Server to cope with changes without using the typical RDBMS sharding method. In addition, the architecture differs from the method used by memcached, which uses client-side key hashes to determine the server from a defined list. The memcached method requires active management of the list of servers and specific hashing algorithms such as Ketama to cope with changes to the topology.