View and query examples
Building views and querying the indexes they generate is a combined process based both on the document structure and the view definition. Writing an effective view to query your data may require changing or altering your document structure, or creating a more complex view in order to allow the specific selection of the data through the querying mechanism.
For background and examples, the following selections provide a number of different scenarios and examples have been built to demonstrate the document structures, views and querying parameters required for different situations.
General advice
There are some general points and advice for writing all views that apply irrespective of the document structure, query format, or view content.
- Do not assume the field will exist in all documents.
Fields may be missing from your document, or may only be supported in specific document types. Use an if test to identify problems. For example:
if (document.firstname)...
- View output is case sensitive.
The value emitted by the emit() function is case sensitive. Emitting a field value of ‘Martin’ but specifying a key value of ‘martin’ will not match the data. Emitted data, and the key selection values, should be normalized to eliminate potential problems. For example:
emit(doc.firstname.toLowerCase(),null);
- Number formatting
Numbers within JavaScript may inadvertently be converted and output as strings. To ensure that data is correctly formatted, the value should be explicitly converted. For example:
emit(parseInt(doc.value,10),null);
The parseInt() built-in function will convert a supplied value to an integer. The parseFloat() function can be used for floating-point numbers.
Validating document type
If your dataset includes documents that may be either JSON or binary, then you do not want to create a view that outputs individual fields for non-JSON documents. You can fix this by using a view that checks the metadata type field before outputting the JSON view information:
function(doc,meta) {
if (meta.type == "json") {
emit(doc.firstname.toLowerCase(),null);
}
}
In the above example, the emit() function will only be called on a valid JSON document. Non-JSON documents will be ignored and not included in the view output.
Document ID (primary) index
To create a ‘primary key’ index, i.e. an index that contains a list of every document within the database, with the document ID as the key, you can create a simple view:
function(doc,meta)
{
emit(meta.id,null);
}
This enables you to iterate over the documents stored in the database.
This will provide you with a view that outputs the document ID of every document in the bucket using the document ID as the key.
The view can be useful for obtaining groups or ranges of documents based on the document ID, for example to get documents with a specific ID prefix:
?startkey="object"&endkey="object\u0000"
Or to obtain a list of objects within a given range:
?startkey="object100"&endkey="object199"
For all views, the document ID is automatically included as part of the view response. But the without including the document ID within the key emitted by the view, it cannot be used as a search or querying mechanism.
Secondary index
The simplest form of view is to create an index against a single field from the documents stored in your database.
For example, given the document structure:
{
"firstname": "Martin",
"lastname": "Brown"
}
A view to support queries on the firstname field could be defined as follows:
function(doc, meta)
{
if (doc.firstname)
{
emit(doc.firstname.toLowerCase(),null);
}
}
The view works as follows for each document:
Only outputs a record if the document contains a firstname field.
Converts the content of the firstname field to lowercase.
Queries can now be specified by supplying a string converted to lowercase. For example:
?key="martin"
Will return all documents where the firstname field contains ‘Martin’, regardless of the document field capitalization.
Using expiration metadata
The metadata object makes it very easy to create and update different views on your data using information outside of the main document data. For example, you can use the expiration field within a view to get the list of recently active sessions in a system.
Using the following map() function, which uses the expiration as part of the emitted data.
function(doc, meta)
{
if (doc.type && doc.type == "session")
{
emit(meta.expiration, doc.nickname)
}
}
If you have sessions which are saved with a TTL, this will allow you to give a view of who was recently active on the service.
Emitting multiple rows
The emit() function is used to create a record of information for the view during the map phase, but it can be called multiple times within that map phase to allowing querying over more than one source of information from each stored document.
An example of this is when the source documents contain an array of information. For example, within a recipe document, the list of ingredients is exposed as an array of objects. By iterating over the ingredients, an index of ingredients can be created and then used to find recipes by ingredient.
{
"title": "Fried chilli potatoes",
"preptime": "5"
"servings": "4",
"totaltime": "10",
"subtitle": "A new way with chips.",
"cooktime": "5",
"ingredients": [
{
"ingredtext": "chilli powder",
"ingredient": "chilli powder",
"meastext": "3-6 tsp"
},
{
"ingredtext": "potatoes, peeled and cut into wedges",
"ingredient": "potatoes",
"meastext": "900 g"
},
{
"ingredtext": "vegetable oil for deep frying",
"ingredient": "vegetable oil for deep frying",
"meastext": ""
}
],
}
The view can be created using the following map() function:
function(doc, meta)
{
if (doc.ingredients)
{
for (i=0; i < doc.ingredients.length; i++)
{
emit(doc.ingredients[i].ingredient, null);
}
}
}
To query for a specific ingredient, specify the ingredient as a key:
?key="carrot"
The keys parameter can also be used in this situation to look for recipes that contain multiple ingredients. For example, to look for recipes that contain either “potatoes” or “chilli powder” you would use:
?keys=["potatoes","chilli powder"]
This will produce a list of any document containing either ingredient. A simple count of the document IDs by the client can determine which recipes contain all three.
The output can also be combined. For example, to look for recipes that contain carrots and can be cooked in less than 20 minutes, the view can be rewritten as:
function(doc, meta)
{
if (doc.ingredients)
{
for (i=0; i < doc.ingredients.length; i++)
{
if (doc.ingredients[i].ingredtext && doc.totaltime)
{
emit([doc.ingredients[i].ingredtext, parseInt(doc.totaltime,10)], null);
}
}
}
}
In this map function, an array is output that generates both the ingredient name, and the total cooking time for the recipe. To perform the original query, carrot recipes requiring less than 20 minutes to cook:
?startkey=["carrot",0]&endkey=["carrot",20]
This generates the following view:
{"total_rows":26471,"rows":[
{"id":"Mangoandcarrotsmoothie","key":["carrots",5],"value":null},
{"id":"Cheeseandapplecoleslaw","key":["carrots",15],"value":null}
]
}
Date and time selection
For date and time selection, consideration must be given to how the data will need to be selected when retrieving the information. This is particularly true when you want to perform log roll-up or statistical collection by using a reduce function to count or quantify instances of a particular event over time.
Examples of this in action include querying data over a specific range, on specific day or date combinations, or specific time periods. Within a traditional relational database it is possible to perform an extraction of a specific date or date range by storing the information in the table as a date type.
Within a map/reduce, the effect can be simulated by exposing the date into the individual components at the level of detail that you require. For example, to obtain a report that counts individual log types over a period identifiable to individual days, you can use the following map() function:
function(doc, meta) {
emit([doc.year, doc.mon, doc.day, doc.logtype], null);
}
By incorporating the full date into the key, the view provides the ability to search for specific dates and specific ranges. By modifying the view content you can simplify this process further. For example, if only searches by year/month are required for a specific application, the day can be omitted.
And with the corresponding reduce() built-in of _count, you can perform a number of different queries. Without any form of data selection, for example, you can use the group_level parameter to summarize down as far as individual day, month, and year. Additionally, because the date is explicitly output, information can be selected over a specific range, such as a specific month:
endkey=[2010,9,30]&group_level=4&startkey=[2010,9,0]
Here the explicit date has been specified as the start and end key. The group_level is required to specify roll-up by the date and log type.
This will generate information similar to this:
{"rows":[
{"key":[2010,9,1,"error"],"value":5},
{"key":[2010,9,1,"warning"],"value":10},
{"key":[2010,9,2,"error"],"value":8},
{"key":[2010,9,2,"warning"],"value":9},
{"key":[2010,9,3,"error"],"value":16},
{"key":[2010,9,3,"warning"],"value":8},
{"key":[2010,9,4,"error"],"value":15},
{"key":[2010,9,4,"warning"],"value":11},
{"key":[2010,9,5,"error"],"value":6},
{"key":[2010,9,5,"warning"],"value":12}
]
}
Additional granularity, for example down to minutes or seconds, can be achieved by adding those as further arguments to the map function:
function(doc, meta)
{
emit([doc.year, doc.mon, doc.day, doc.hour, doc.min, doc.logtype], null);
}
The same trick can also be used to output based on other criteria. For example, by day of the week, week number of the year or even by period:
function(doc, meta) {
if (doc.mon)
{
var quarter = parseInt((doc.mon - 1)/3,10)+1;
emit([doc.year, quarter, doc.logtype], null);
}
}
To get more complex information, for example a count of individual log types for a given date, you can combine the map() and reduce() stages to provide the collation.
For example, by using the following map() function we can output and collate by day, month, or year as before, and with data selection at the date level.
function(doc, meta) {
emit([doc.year, doc.mon, doc.day], doc.logtype);
}
For convenience, you may wish to use the dateToArray() function, which converts a date object or string into an array. For example, if the date has been stored within the document as a single field:
function(doc, meta) {
emit(dateToArray(doc.date), doc.logtype);
}
For more information, see dateToArray().
Using the following reduce() function, data can be collated for each individual logtype for each day within a single record of output.
function(key, values, rereduce)
{
var response = {"warning" : 0, "error": 0, "fatal" : 0 };
for(i=0; i<values.length; i++)
{
if (rereduce)
{
response.warning = response.warning + values[i].warning;
response.error = response.error + values[i].error;
response.fatal = response.fatal + values[i].fatal;
}
else
{
if (values[i] == "warning")
{
response.warning++;
}
if (values[i] == "error" )
{
response.error++;
}
if (values[i] == "fatal" )
{
response.fatal++;
}
}
}
return response;
}
When queried using a group_level of two (by month), the following output is produced:
{"rows":[
{"key":[2010,7], "value":{"warning":4,"error":2,"fatal":0}},
{"key":[2010,8], "value":{"warning":4,"error":3,"fatal":0}},
{"key":[2010,9], "value":{"warning":4,"error":6,"fatal":0}},
{"key":[2010,10],"value":{"warning":7,"error":6,"fatal":0}},
{"key":[2010,11],"value":{"warning":5,"error":8,"fatal":0}},
{"key":[2010,12],"value":{"warning":2,"error":2,"fatal":0}},
{"key":[2011,1], "value":{"warning":5,"error":1,"fatal":0}},
{"key":[2011,2], "value":{"warning":3,"error":5,"fatal":0}},
{"key":[2011,3], "value":{"warning":4,"error":4,"fatal":0}},
{"key":[2011,4], "value":{"warning":3,"error":6,"fatal":0}}
]
}
The input includes a count for each of the error types for each month. Note that because the key output includes the year, month and date, the view also supports explicit querying while still supporting grouping and roll-up across the specified group. For example, to show information from 15th November 2010 to 30th April 2011 using the following query:
?endkey=[2011,4,30]&group_level=2&startkey=[2010,11,15]
Which generates the following output:
{"rows":[
{"key":[2010,11],"value":{"warning":1,"error":8,"fatal":0}},
{"key":[2010,12],"value":{"warning":3,"error":4,"fatal":0}},
{"key":[2011,1],"value":{"warning":8,"error":2,"fatal":0}},
{"key":[2011,2],"value":{"warning":4,"error":7,"fatal":0}},
{"key":[2011,3],"value":{"warning":4,"error":4,"fatal":0}},
{"key":[2011,4],"value":{"warning":5,"error":7,"fatal":0}}
]
}
Keep in mind that you can create multiple views to provide different views and queries on your document data. In the above example, you could create individual views for the limited datatypes of logtype to create a warningsbydate view.
Selective record output
If you are storing different document types within the same bucket, then you may want to ensure that you generate views only on a specific record type within the map() phase. This can be achieved by using an if statement to select the record.
For example, if you are storing blog ‘posts’ and ‘comments’ within the same bucket, then a view on the blog posts could be created using the following map:
function(doc, meta) {
if (doc.title && doc.type && doc.date &&
doc.author && doc.type == 'post')
{
emit(doc.title, [doc.date, doc.author]);
}
}
The same solution can also be used if you want to create a view over a specific range or value of documents while still allowing specific querying structures. For example, to filter all the records from the statistics logging system over a date range that are of the type error you could use the following map() function:
function(doc, meta) {
if (doc.logtype == 'error')
{
emit([doc.year, doc.mon, doc.day],null);
}
}
The same solution can also be used for specific complex query types. For example, all the recipes that can be cooked in under 30 minutes, made with a specific ingredient:
function(doc, meta)
{
if (doc.totaltime && doc.totaltime <= 20)
{
if (doc.ingredients) {
for (i=0; i < doc.ingredients.length; i++)
{
if (doc.ingredients[i].ingredtext)
{
emit(doc.ingredients[i].ingredtext, null);
}
}
}
}
}
The above function provides for much quicker and simpler selection of recipes by using a query and the key parameter, instead of having to work out the range that may be required to select recipes when the cooking time and ingredients are generated by the view.
These selections are application specific, but by producing different views for a range of appropriate values, for example 30, 60, or 90 minutes, recipe selection can be much easier at the expense of updating additional view indexes.
Sorting on reduce values
The sorting algorithm within the view system outputs information ordered by the generated key within the view, and therefore it operates before any reduction takes place. Unfortunately, it is not possible to sort the output order of the view on computed reduce values, as there is no post-processing on the generated view information.
To sort based on reduce values, you must access the view content with reduction enabled from a client, and perform the sorting within the client application.
Solutions for simulating joins
Joins between data, even when the documents being examined are contained within the same bucket, are not possible directly within the view system. However, you can simulate this by making use of a common field used for linking when outputting the view information. For example, consider a blog post system that supports two different record types, ‘blogpost’ and ‘blogcomment’. The basic format for ‘blogpost’ is:
{
"type" : "post",
"title" : "Blog post"
"categories" : [...],
"author" : "Blog author"
...
}
The corresponding comment record includes the blog post ID within the document structure:
{
"type" : "comment",
"post_id" : "post_3454"
"author" : "Comment author",
"created_at" : 123498235
...
}
To output a blog post and all the comment records that relate to the blog post, you can use the following view:
function(doc, meta)
{
if (doc.post_id && doc.type && doc.type == "post")
{
emit([doc.post_id, null], null);
}
else if (doc.post_id && doc.created_at && doc.type && doc.type == "comment")
{
emit([doc.post_id, doc.created_at], null);
}
}
The view makes use of the sorting algorithm when using arrays as the view key. For a blog post record, the document ID will be output will a null second value in the array, and the blog post record will therefore appear first in the sorted output from the view. For a comment record, the first value will be the blog post ID, which will cause it to be sorted in line with the corresponding parent post record, while the second value of the array is the date the comment was created, allowing sorting of the child comments.
For example:
{"rows":[
{"key":["post_219",null], "value":{...}},
{"key":["post_219",1239875435],"value":{...}},
{"key":["post_219",1239875467],"value":{...}},
]
}
Another alternative is to make use of a multi-get operation within your client through the main Couchbase SDK interface, which should load the data from cache. This lets you structure your data with the blog post containing an array of the of the child comment records. For example, the blog post structure might be:
{
"type" : "post",
"title" : "Blog post"
"categories" : [...],
"author" : "Blog author",
"comments": ["comment_2298","comment_457","comment_4857"],
...
}
To obtain the blog post information and the corresponding comments, create a view to find the blog post record, and then make a second call within your client SDK to get all the comment records from the Couchbase Server cache.
Simulating transactions
Couchbase Server does not support transactions, but the effect can be simulated by writing a suitable document and view definition that produces the effect while still only requiring a single document update to be applied.
For example, consider a typical banking application, the document structure could be as follows:
{
"account" : "James",
"value" : 100
}
A corresponding record for another account:
{
"account" : "Alice",
"value" : 200
}
To get the balance of each account, the following map() :
function(doc, meta) {
if (doc.account && doc.value)
{
emit(doc.account,doc.value);
}
}
The reduce() function can use the built-in _sum function.
When queried, using a group_level of 1, the balance of the accounts is displayed:
{"rows":[
{"key":"Alice","value":200},
{"key":"James","value":100}
]
}
Money in an account can be updated just by adding another record into the system with the account name and value. For example, adding the record:
{
"account" : "James",
"value" : 50
}
Re-querying the view produces an updated balance for each account:
{"rows":[
{"key":"Alice","value":200},
{"key":"James","value":150}
]
}
However, if Alice wants to transfer $100 to James, two record updates are required:
- A record that records an update to Alice’s account to reduce the value by 100.
- A record that records an update to James’s account to increase the value by 100.
Unfortunately, the integrity of the transaction could be compromised in the event of a problem between step 1 and step 2. Alice’s account may be deducted, without updates James’ record.
To simulate this operation while creating (or updating) only one record, a combination of a transaction record and a view must be used. The transaction record looks like this:
{
"fromacct" : "Alice",
"toacct" : "James",
"value" : 100
}
The above records the movement of money from one account to another. The view can now be updated to handle a transaction record and output a row through emit() to update the value for each account.
function(doc, meta)
{
if (doc.fromacct)
{
emit(doc.fromacct, -doc.value);
emit(doc.toacct, doc.value);
}
else
{
emit(doc.account, doc.value);
}
}
The above map() effectively generates two fake rows, one row subtracts the amount from the source account, and adds the amount to the destination account. The resulting view then uses the reduce() function to sum up the transaction records for each account to arrive at a final balance:
{"rows":[
{"key":"Alice","value":100},
{"key":"James","value":250}
]
}
Throughout the process, only one record has been created, and therefore transient problems with that record update can be captured without corrupting or upsetting the existing stored data.
Simulating multi-phase transactions
The technique in Simulating Transactions works if your data will allow the use of a view to effectively roll-up the changes into a single operation. However, if your data and document structure do not allow it then you can use a multi-phase transaction process to perform the operation in a number of distinct stages.
This method is not reliant on views, but the document structure and update make it easy to find out if there are ‘hanging’ or trailing transactions that need to be processed without additional document updates. Using views and the Observe operation to monitor changes could lead to long wait times during the transaction process while the view index is updated.
To employ this method, you use a similar transaction record as in the previous example, but use the transaction record to record each stage of the update process.
Start with the same two account records:
{
"type" : "account",
"account" : "James",
"value" : 100,
"transactions" : []
}
The record explicitly contains a transactions field which contains an array of all the currently active transactions on this record.
The corresponding record for the other account:
{
"type" : "account",
"account" : "Alice",
"value" : 200,
"transactions" : []
}
Now perform the following operations in sequence:
- Create a new transaction record that records the transaction information:
{ "type" : "transaction", "fromacct" : "Alice", "toacct" : "James", "value" : 100, "status" : "waiting" }
The core of the transaction record is the same, the difference is the use of a status field which will be used to monitor the progress of the transaction.
Record the ID of the transaction, for example, transact_20120717163.
- Set the value of the status field in the transaction document to
‘pending’:
{ "type" : "transaction", "fromacct" : "Alice", "toacct" : "James", "value" : 100, "status" : "pending" }
- Find all transaction records in the pending state using a suitable
view:
function(doc, meta) { if (doc.type && doc.status && doc.type == "transaction" && doc.status == "pending" ) { emit([doc.fromacct,doc.toacct], doc.value); } }
- Update the record identified in toacct with the transaction
information, ensuring that the transaction is not already pending:
{ "type" : "account", "account" : "Alice", "value" : 100, "transactions" : ["transact_20120717163"] }
Repeat on the other account:
{ "type" : "account", "account" : "James", "value" : 200, "transactions" : ["transact_20120717163"] }
- Update the transaction record to mark that the records have been updated:
{ "type" : "transaction", "fromacct" : "Alice", "toacct" : "James", "value" : 100, "status" : "committed" }
- Find all transaction records in the committed state using a suitable
view:
function(doc, meta) { if (doc.type && doc.status && doc.type == "transaction" && doc.status == "committed" ) { emit([doc.fromacct, doc.toacct], doc.value); } }
Update the source account record noted in the transaction and remove the transaction ID:
{ "type" : "account", "account" : "Alice", "value" : 100, "transactions" : [] }
Repeat on the other account:
{ "type" : "account", "account" : "James", "value" : 200, "transactions" : [] }
- Update the transaction record state to ‘done’. This will remove the transaction from the two views used to identify unapplied, or uncommitted transactions.
Within this process, although there are multiple steps required, you can identify at each step whether a particular operation has taken place or not.
For example, if the transaction record is marked as ‘pending’, but the corresponding account records do not contain the transaction ID, then the record still needs to be updated. Since the account record can be updated using a single atomic operation, it is easy to determine if the record has been updated or not.
The result is that any sweep process that accesses the views defined in each step can determine whether the record needs updating. Equally, if an operation fails, a record of the transaction, and whether the update operation has been applied, also exists, allowing the changes to be reversed and backed out.