Troubleshooting

This section provides general, non-SDK specific information about logging, backups and restores, as well as failover information. For more information, please refer to the API Reference for your chosen SDK.

Configuring logs

You can configure logging at a few different levels for Couchbase Server:

Couchbase Server logs. The primary source for logging information is Couchbase Administrative Console. The installation for Couchbase Server automatically sets up and starts logging for you. There are also optional, lower level logs which you can configure. For more information, see Troubleshooting.
SDK-specific log errors. For more information, refer to the Language Reference for your chosen SDK.

Backups and restores

Backing up your information should be a regular process you perform to help ensure you do not lose all your data in case of major hardware or other system failure.

Note:

Because you typically want to perform a backup and restore with zero system downtime with Couchbase Server it is impossible to create a complete in-time backup and snapshot of the entire cluster. In production, Couchbase Server will constantly receive requests and updated data; therefore it is impossible to take an accurate snapshot of all possible information. This would be the case for any other database in production mode.

Instead, you can perform full backups, and incremental backups, and merge these two together in order to create a time-specific backup; nonetheless your information may still not be 100% complete.

For more information on backups and restores, see Couchbase Server Manual, “Backup and Restore with Couchbase.”

Handling failover

When a Couchbase Server node fails, any other node functioning in the cluster will continue to process requests and provide responses and you will experience no loss of administrative control. Couchbase SDKs will try to communicate to a failed node, but will receive a message that the requested information cannot be found on the failed node; an SDK will then request updated cluster information from Couchbase Server then communicate with nodes that are still active. Since Couchbase Server distributes information across nodes, and also stores replica data, information from any failed node will still exist in the cluster and an SDK can access it.

There are two ways to handle possible node failures with Couchbase Server:

Auto-failover: You can specify the maximum amount of time a node is unresponsive and then Couchbase Server will remove that node from a cluster. For more information, see Couchbase Server Manual, Node Failure.
Manual-failover: In this case, a person will determine that a node is down, and then remove the node from a cluster.

In either case, when a node is removed, Couchbase Server will automatically redistribute information from that node to all other functioning nodes in the cluster. However, at this point, the existing nodes will not have replicas established for the additional data. In order to provide replication, you will want to perform a rebalance on the cluster. The rebalance will:

Redistribute stored data across remaining nodes in the cluster,
Create replica data for all buckets in the cluster,
Provide information on the new location for information, based on SDK requests.

In general, rebalances with Couchbase Server have less of a performance impact than you would expect with a traditional relational database, with all other factors such as size of data set as a constant. However, rebalances will increase the overload load and resource utilization for a cluster and will lead to some amount of performance loss. Therefore, it is a best practice to perform a rebalance after node failure during the lowest application use, if possible. After node failure, you could choose to perform one of these options:

Leave the cluster functioning with one less node. Be aware that the cluster still needs to adequately maintain the volume of requests and data with one less node,
If possible, get the failed node functioning once again, add it to the cluster and then rebalance,
Create a new node to replace the failed node, add it to the cluster, and then rebalance.

For more information about this topic, see Couchbase Server Manual, “Handling a Failover Situation.”