6.2.4. View Cookbook for SQL Jockeys

This is a collection of some common SQL queries and how to get the same result in CouchDB. The key to remember here is that CouchDB does not work like an SQL database at all, and that best practices from the SQL world do not translate well or at all to CouchDB. This document’s “cookbook” assumes that you are familiar with the CouchDB basics such as creating and updating databases and documents.

Using Views

How you would do this in SQL:

CREATE TABLE

or:

ALTER TABLE

How you can do this in CouchDB?

Using views is a two-step process. First you define a view; then you query it. This is analogous to defining a table structure (with indexes) using CREATE TABLE or ALTER TABLE and querying it using an SQL query.

Defining a View

Defining a view is done by creating a special document in a CouchDB database. The only real specialness is the _id of the document, which starts with _design/ — for example, _design/application. Other than that, it is just a regular CouchDB document. To make sure CouchDB understands that you are defining a view, you need to prepare the contents of that design document in a special format. Here is an example:

{
    "_id": "_design/application",
    "_rev": "1-C1687D17",
    "views": {
        "viewname": {
            "map": "function(doc) { ... }",
            "reduce": "function(keys, values) { ... }"
        }
    }
}

We are defining a view viewname. The definition of the view consists of two functions: the map function and the reduce function. Specifying a reduce function is optional. We’ll look at the nature of the functions later. Note that viewname can be whatever you like: users, by-name, or by-date are just some examples.

A single design document can also include multiple view definitions, each identified by a unique name:

{
    "_id": "_design/application",
    "_rev": "1-C1687D17",
    "views": {
        "viewname": {
            "map": "function(doc) { ... }",
            "reduce": "function(keys, values) { ... }"
        },
        "anotherview": {
            "map": "function(doc) { ... }",
            "reduce": "function(keys, values) { ... }"
        }
    }
}

Querying a View

The name of the design document and the name of the view are significant for querying the view. To query the view viewname, you perform an HTTP GET request to the following URI:

/database/_design/application/_view/viewname

database is the name of the database you created your design document in. Next up is the design document name, and then the view name prefixed with _view/. To query anotherview, replace viewname in that URI with anotherview. If you want to query a view in a different design document, adjust the design document name.

MapReduce Functions

MapReduce is a concept that solves problems by applying a two-step process, aptly named the map phase and the reduce phase. The map phase looks at all documents in CouchDB separately one after the other and creates a map result. The map result is an ordered list of key/value pairs. Both key and value can be specified by the user writing the map function. A map function may call the built-in emit(key, value) function 0 to N times per document, creating a row in the map result per invocation.

CouchDB is smart enough to run a map function only once for every document, even on subsequent queries on a view. Only changes to documents or new documents need to be processed anew.

Map functions

Map functions run in isolation for every document. They can’t modify the document, and they can’t talk to the outside world—they can’t have side effects. This is required so that CouchDB can guarantee correct results without having to recalculate a complete result when only one document gets changed.

The map result looks like this:

{"total_rows":3,"offset":0,"rows":[
{"id":"fc2636bf50556346f1ce46b4bc01fe30","key":"Lena","value":5},
{"id":"1fb2449f9b9d4e466dbfa47ebe675063","key":"Lisa","value":4},
{"id":"8ede09f6f6aeb35d948485624b28f149","key":"Sarah","value":6}
]}

It is a list of rows sorted by the value of key. The id is added automatically and refers back to the document that created this row. The value is the data you’re looking for. For example purposes, it’s the girl’s age.

The map function that produces this result is:

function(doc) {
    if(doc.name && doc.age) {
        emit(doc.name, doc.age);
    }
}

It includes the if statement as a sanity check to ensure that we’re operating on the right fields and calls the emit function with the name and age as the key and value.

Look Up by Key

How you would do this in SQL:

SELECT field FROM table WHERE value="searchterm"

How you can do this in CouchDB?

Use case: get a result (which can be a record or set of records) associated with a key (“searchterm”).

To look something up quickly, regardless of the storage mechanism, an index is needed. An index is a data structure optimized for quick search and retrieval. CouchDB’s map result is stored in such an index, which happens to be a B+ tree.

To look up a value by “searchterm”, we need to put all values into the key of a view. All we need is a simple map function:

function(doc) {
    if(doc.value) {
        emit(doc.value, null);
    }
}

This creates a list of documents that have a value field sorted by the data in the value field. To find all the records that match “searchterm”, we query the view and specify the search term as a query parameter:

/database/_design/application/_view/viewname?key="searchterm"

Consider the documents from the previous section, and say we’re indexing on the age field of the documents to find all the five-year-olds:

function(doc) {
    if(doc.age && doc.name) {
        emit(doc.age, doc.name);
    }
}

Query:

/ladies/_design/ladies/_view/age?key=5

Result:

{"total_rows":3,"offset":1,"rows":[
{"id":"fc2636bf50556346f1ce46b4bc01fe30","key":5,"value":"Lena"}
]}

Easy.

Note that you have to emit a value. The view result includes the associated document ID in every row. We can use it to look up more data from the document itself. We can also use the ?include_docs=true parameter to have CouchDB fetch the individual documents for us.

Look Up by Prefix

How you would do this in SQL:

SELECT field FROM table WHERE value LIKE "searchterm%"

How you can do this in CouchDB?

Use case: find all documents that have a field value that starts with searchterm. For example, say you stored a MIME type (like text/html or image/jpg) for each document and now you want to find all documents that are images according to the MIME type.

The solution is very similar to the previous example: all we need is a map function that is a little more clever than the first one. But first, an example document:

{
    "_id": "Hugh Laurie",
    "_rev": "1-9fded7deef52ac373119d05435581edf",
    "mime-type": "image/jpg",
    "description": "some dude"
}

The clue lies in extracting the prefix that we want to search for from our document and putting it into our view index. We use a regular expression to match our prefix:

function(doc) {
    if(doc["mime-type"]) {
        // from the start (^) match everything that is not a slash ([^\/]+) until
        // we find a slash (\/). Slashes needs to be escaped with a backslash (\/)
        var prefix = doc["mime-type"].match(/^[^\/]+\//);
        if(prefix) {
          emit(prefix, null);
        }
    }
}

We can now query this view with our desired MIME type prefix and not only find all images, but also text, video, and all other formats:

/files/_design/finder/_view/by-mime-type?key="image/"

Aggregate Functions

How you would do this in SQL:

SELECT COUNT(field) FROM table

How you can do this in CouchDB?

Use case: calculate a derived value from your data.

We haven’t explained reduce functions yet. Reduce functions are similar to aggregate functions in SQL. They compute a value over multiple documents.

To explain the mechanics of reduce functions, we’ll create one that doesn’t make a whole lot of sense. But this example is easy to understand. We’ll explore more useful reductions later.

Reduce functions operate on the output of the map function (also called the map result or intermediate result). The reduce function’s job, unsurprisingly, is to reduce the list that the map function produces.

Here’s what our summing reduce function looks like:

function(keys, values) {
    var sum = 0;
    for(var idx in values) {
        sum = sum + values[idx];
    }
    return sum;
}

Here’s an alternate, more idiomatic JavaScript version:

function(keys, values) {
    var sum = 0;
    values.forEach(function(element) {
        sum = sum + element;
    });
    return sum;
}

Note

Don’t miss effective built-in reduce functions like _sum and _count

This reduce function takes two arguments: a list of keys and a list of values. For our summing purposes we can ignore the keys-list and consider only the value list. We’re looping over the list and add each item to a running total that we’re returning at the end of the function.

You’ll see one difference between the map and the reduce function. The map function uses emit() to create its result, whereas the reduce function returns a value.

For example, from a list of integer values that specify the age, calculate the sum of all years of life for the news headline, “786 life years present at event.” A little contrived, but very simple and thus good for demonstration purposes. Consider the documents and the map view we used earlier in this document.

The reduce function to calculate the total age of all girls is:

function(keys, values) {
    return sum(values);
}

Note that, instead of the two earlier versions, we use CouchDB’s predefined sum() function. It does the same thing as the other two, but it is such a common piece of code that CouchDB has it included.

The result for our reduce view now looks like this:

{"rows":[
    {"key":null,"value":15}
]}

The total sum of all age fields in all our documents is 15. Just what we wanted. The key member of the result object is null, as we can’t know anymore which documents took part in the creation of the reduced result. We’ll cover more advanced reduce cases later on.

As a rule of thumb, the reduce function should reduce to a single scalar value. That is, an integer; a string; or a small, fixed-size list or object that includes an aggregated value (or values) from the values argument. It should never just return values or similar. CouchDB will give you a warning if you try to use reduce “the wrong way”:

{
    "error":"reduce_overflow_error",
    "message":"Reduce output must shrink more rapidly: Current output: ..."
}

Get Unique Values

How you would do this in SQL:

SELECT DISTINCT field FROM table

How you can do this in CouchDB?

Getting unique values is not as easy as adding a keyword. But a reduce view and a special query parameter give us the same result. Let’s say you want a list of tags that your users have tagged themselves with and no duplicates.

First, let’s look at the source documents. We punt on _id and _rev attributes here:

{
    "name":"Chris",
    "tags":["mustache", "music", "couchdb"]
}
{
    "name":"Noah",
    "tags":["hypertext", "philosophy", "couchdb"]
}
{
    "name":"Jan",
    "tags":["drums", "bike", "couchdb"]
}

Next, we need a list of all tags. A map function will do the trick:

function(doc) {
    if(doc.name && doc.tags) {
        doc.tags.forEach(function(tag) {
            emit(tag, null);
        });
    }
}

The result will look like this:

{"total_rows":9,"offset":0,"rows":[
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"bike","value":null},
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"couchdb","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"couchdb","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"couchdb","value":null},
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"drums","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"hypertext","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"music","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"mustache","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"philosophy","value":null}
]}

As promised, these are all the tags, including duplicates. Since each document gets run through the map function in isolation, it cannot know if the same key has been emitted already. At this stage, we need to live with that. To achieve uniqueness, we need a reduce:

function(keys, values) {
    return true;
}

This reduce doesn’t do anything, but it allows us to specify a special query parameter when querying the view:

/dudes/_design/dude-data/_view/tags?group=true

CouchDB replies:

{"rows":[
{"key":"bike","value":true},
{"key":"couchdb","value":true},
{"key":"drums","value":true},
{"key":"hypertext","value":true},
{"key":"music","value":true},
{"key":"mustache","value":true},
{"key":"philosophy","value":true}
]}

In this case, we can ignore the value part because it is always true, but the result includes a list of all our tags and no duplicates!

With a small change we can put the reduce to good use, too. Let’s see how many of the non-unique tags are there for each tag. To calculate the tag frequency, we just use the summing up we already learned about. In the map function, we emit a 1 instead of null:

function(doc) {
    if(doc.name && doc.tags) {
        doc.tags.forEach(function(tag) {
            emit(tag, 1);
        });
    }
}

In the reduce function, we return the sum of all values:

function(keys, values) {
    return sum(values);
}

Now, if we query the view with the ?group=true parameter, we get back the count for each tag:

{"rows":[
{"key":"bike","value":1},
{"key":"couchdb","value":3},
{"key":"drums","value":1},
{"key":"hypertext","value":1},
{"key":"music","value":1},
{"key":"mustache","value":1},
{"key":"philosophy","value":1}
]}

Enforcing Uniqueness

How you would do this in SQL:

UNIQUE KEY(column)

How you can do this in CouchDB?

Use case: your applications require that a certain value exists only once in a database.

This is an easy one: within a CouchDB database, each document must have a unique _id field. If you require unique values in a database, just assign them to a document’s _id field and CouchDB will enforce uniqueness for you.

There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.