Improving application performance
There are few main variables that can impact application performance which you can help control and manage:
-
Getting cluster sizing correct for your application load,
-
Structuring documents for efficient reads/writes,
-
Using SDK methods which are more efficient for the operation you want to perform.
-
Optimize your use of Couchbase client connections.
Correctly sizing your cluster is one of the most important tasks you need to complete in order to provide good performance. Couchbase Server performs best when you have smaller documents in your data set, and when a large majority of this data set is in RAM. This means you need to take into consideration the size of your application data set and how much of this data set will be in active, constant use. This set of actively used data is also called your ‘working set.’ In general, 99% of your working set should be in RAM. This means you need to plan your cluster and size your RAM data buckets to handle your working set.
Performing cluster sizing
Before your application goes into production, you will need to determine your cluster size. This includes:
-
Determine how many initial nodes will be required to support your user base,
-
Determine how much capacity you need for data storage and processing in terms of RAM, CPU, disk space, and network bandwidth.
-
Determine the level of performance availability you want.
For instance, if you want to provide high-availability for even a smaller dataset, you will need a minimum of three nodes for your cluster. For detailed information about determining cluster and resource sizing, see Couchbase Server Manual: Sizing Guidelines.
Improving document access
The way that you structure documents in Couchbase Server will influence how often retrieve them for their information, and will therefore influence application performance. Given identical document size for your entire data set, it takes more operations to retrieve two documents than it does one document; therefore there are scenarios where you can reduce the number of reads/write you perform on Couchbase Server if you perform the reads/writes on one document instead of many documents. In doing so, you improve application performance by structuring your documents in way that optimizes read/write times.
The following goes back to our beer application example and illustrates all the additional operations you would need to perform if you used separate documents. In this case, pretend our beer application has a ‘leader board.’ This board has all of the top 10 best selling beers that exist in our application. Imagine what this leader board document would look like:
{
"board_id": 222
"leader_board": "best selling"
"top_sales" : [ "beer_id" : 75623,
"beer_id" : 98756,
"beer_id" : 2938,
"beer_id" : 49283,
"beer_id" : 204857,
"beer_id" : 12345,
"beer_id" : 23456,
"beer_id" : 56413,
"beer_id" : 24645,
"beer_id" : 34502
],
"updated": "2010-07-22 20:00:20"
}
In the example document above, we store a reference to a top-selling beer in the ‘top_sales’ array. A specific beer in that list of beers could look like this document:
{
"beer_id" : 75623,
"name" : "Pleny the Felder"
"type" : "wheat",
"aroma" : "wheaty",
"category": "koelsch",
"units_sold": 37011,
"brewery" : ”brewery_Legacy_Brewing_Co”
}
If we use this approach, we need to 1) retrieve the leader board document from Couchbase Server, 2) go through each element in the ‘top_sales’ array and retrieve each beer from Couchbase Server, 3) get the ‘units_sold’ value from each beer document. Consider the alternative when we use a single leader board document with the relevant beer sales:
{
"board_id": 222
"leader_board": "best selling"
"top_sales" : [ { "beer_id" : 75623, "units_sold": 37011, "name": "Pleny the Felder" },
{ "beer_id" : 98756, "units_sold": 23002, "name": "Sub-Hoptimus" },
{ "beer_id" : 2938, "units_sold": 23001, "name": "Speckled Hen" },
{ "beer_id" : 49283, "units_sold": 11023, "name": "Happy Hops" },
{ "beer_id" : 204857, "units_sold": 9856, "name": "Bruxulle Rouge" },
{ "beer_id" : 12345, "units_sold": 7654, "name": "Plums Pilsner" },
{ "beer_id" : 23456, "units_sold": 7112, "name": "Humble Amber Lager" },
{ "beer_id" : 56413, "units_sold": 6723, "name": "Hermit Dopplebock" },
{ "beer_id" : 24645, "units_sold": 6409, "name": "IAM Lambic" },
{ "beer_id" : 34502, "units_sold": 5012, "name": "Inlaws Special Bitter" }
],
"updated": "2010-07-22 20:00:20"
}
In this case, we only need to perform a single request to get the leader board document from Couchbase Server. Then within our application logic, we can get each of leading beers from that document. Instead of eleven database requests, we have a single request, which is far less time- and resource- consuming as having multiple server requests. So when you creating or modifying document structures, keep in mind this approach.
Using the fastest methods
There are several Couchbase SDK APIs which are considered ‘convenience’ methods in that they provide commonly used functionality in a single method call. They tend to be less resource intensive processes that can be used in place of a series of get()/set() calls that you would otherwise have to perform to achieve the same result. Typically these convenience methods enable you to perform an operation in single request to Couchbase Server, instead of having to do two requests. The following is a summary of recommended alternative calls:
-
Multi-Get/Bulk-Get: When you want to retrieve multiple items and have all of the keys, then performing a multi-get retrieves all the keys in a single request as opposed to a request per key. It is therefore faster and less resource intensive than performing individual, sequential get() calls. The following demonstrates a multi-get in Ruby:
keys = ["foo", "bar","baz"] // alternate method signatures for multi-get conn.get(keys) conn.get(["foo", "bar", "baz"]) conn.get("foo", "bar", "baz")
Each key we provide in the array will be sent in a single request, and Couchbase Server will return a single response with all existing keys. Consult the API documentation for your chosen SDK to find out more about a specify method call for multi-gets.
-
Increment/Decrement: These are two other convenience methods which enable you to perform an update without having to call a get() and set(). Typically if you want to increment or decrement an integer, you would need to 1) retrieve it with a request to Couchbase, 2) add an amount to ithe value if it exists, or set it to an initial value otherwise and 3) then store the value Couchbase Server. If a key is not found, Couchbase Server will store the initial value, but not increment or decrement it as part of the operation. With increment and decrement methods, you can perform all three steps in a single method call, as we do in this Ruby SDK example:
client.increment("score", :delta => 1, :initial => 100);
In this example in we provide a key, and also two other parameters: one is an initial value, the later is the increment amount. Most Couchbase SDKs follow a similar signature. The first parameter is the key you want to increment or decrement, the second parameter is an initial value if the value does not already exist, and the third parameter is the amount that Couchbase Server will increment/decrement the existing value. In a single server request and response, increment and decrement methods provide you the convenience of establishing a key-document if it does not exist, and provide the ability to increment/decrement. Over thousands or millions of documents, this approach will improve application performance compared to using get()/set() to perform the functional equivalent.
-
Prepend and append: These two methods provide the functional equivalent of: 1) retrieving a key from Couchbase Server with a request, 2) adding binary content to the document, and then 3) making a second request to Couchbase Server to store the updated value. With prepend and append, you can perform these three steps in a single request to Couchbase Server. The following illustrates this in Python. To see the full example in Python, including encoding and decoding the data, see Maintaining a Set :
def modify(cb, indexName, op, keys): encoded = encodeSet(keys, op) try: cb.append(indexName, encoded) except KeyError: # If we can't append, and we're adding to the set, # we are trying to create the index, so do that. if op == '+': cb.add(indexName, encoded) def add(mc, indexName, *keys): """Add the given keys to the given set.""" modify(cb, indexName, '+', keys) def remove(cb, indexName, *keys): """Remove the given keys from the given set.""" modify(cb, indexName, '-', keys)
This example can be used to manage a set of keys, such as 'a', 'b', 'c' and can indicate that given keys are including or not included in a set by using append. For instance, given a set 'a', 'b', 'c', if you update the set to read +a +b +c -b this actually represents {a, c}. We have method modify() which will take a Couchbase client object, a named set, an operator, and keys. The modify() tries to append the new key with the operator into the named set, and since append fails if the set does not exist, modify can add the new set.
Compared to using a separate get() call, appending the string to the start of the document, then saving the document back to Couchbase with another request, we have accomplished it in a single call/request. Once again you improve application performance if you substitute get()/set() sequences, with a single append or prepend; this is particular so if you are performing this on thousands or millions of documents.
Append()/Prepend() can add raw serialized data to existing data for a key. The Couchbase Server treats an existing value as a binary stream and concatenates the new content to either beginning or end. Non-linear, hierarchical formats in the database will merely have the new information added at the start or end. There will be no logic which adds the information to a certain place in a stored document structure or object.
Therefore, if you have a serialized object in Couchbase Server and then append or prepend, the existing content in the serialized object will not be extended. For instance, if you append() an integer to an Array stored in Couchbase, this will result in the record containing a serialized array, and then the serialized integer.
Optimizing client instances
Creating a new connection to Couchbase Server from an SDK, is done by creating an instance of the Couchbase client. When you create this object, it is one of the more resource-consuming processes that you can perform with the SDKs.
When you create a new connection, Couchbase Server needs to provide current server topology to the client instance and it may also need to perform authentication. All of this is more time consuming and resource intensive compared to when you perform a read/write on data once a connection already exists. Because this is the case, you want to try to reduce the number of times you need to create a connection and attempt to reuse existing connections to the extent possible.
There are different approaches for each SDK on connection reuse; some SDKs use a connection-pool approach, some SDKs rely more on connection reuse. Please refer to the Language reference for your respective SDK for information on how to implement this. The other approach is to handle multiple requests from a single, persistent client instance. The next section discusses this approach.
Maintaining persistent connections
Couchbase SDKs support persistent connections which enable you to send multiple requests and receive multiple responses using the same connection. How the Couchbase SDKs implement persistent connections varies by SDK. Here are the respective approaches you can use:
-
PHP: Persistent connections for PHP clients are actually persistent memory that we use across multiple requests in a PHP process. Typically you use one PHP process per system process. The web server that is currently in use in your system will determine this. To configure the PHP SDK to maintain a persistent connection you would use these parameters in your connection:
$cb = new Couchbase("192.168.1.200:8091", "default", "", "default", true); // uses the default bucket
This example uses the default bucket. Arguments include host:port, username, password, bucket name, and true indicates we want to use a persistent connection.
-
Java: When you create connection with the Java SDK, the connection is a thread-safe object that can be shared across multiple processes. The alternative is that you can create a connection pool which contains a multiple connection objects.
-
.Net: Connections that you create with the.net SDK are also thread-safe objects; for persisted connections, you can use a connection pool which contains multiple connection objects. You should create only a single static instance of a Couchbase client per bucket, in accordance with.Net framework. The persistent client will maintain connection pools per server node. For more information, see MSDN: AppDomain Class.
-
You can persist a Couchbase client storing it in a way such that the Ruby garbage collector does not remove from memory. To do this, you can create a singleton object that contains the client instance and the connection information. You should access the class-level method, Couchbase.bucket instead of Couchbase.connect to get the client instance.
When you use Couchbase.bucket it will create a new client object when you first call it and then store the object in thread storage. If the thread is still alive when the next request is made to the ruby process, the SDK will not create a new client instance, but rather use the existing one:
```
# Simple example to connect using thread local singleton
Couchbase.connection_options = {
:bucket => "my",
:hostname => "example.com",
:password => "secret"
}
# this call will user connection_options to initialize new connection.
# By default Couchbase.connection_options can be empty
Couchbase.bucket.set("foo", "bar")
# Amend the options of the singleton connection in run-time
Couchbase.bucket.reconnect(:bucket => "another")
```
The first example demonstrates how you can create a client instance as a singleton object, the second one will use the class-level Couchbase.bucket constructor to create a persistent connection. The last example demonstrates how you can update the properties of the singleton connection if you reconnect.
For more information about persistent connections for an SDK, see the individual Language Reference for your chosen SDK.