Translating SQL to map/reduce
SELECT fieldlist FROM table \
WHERE condition \
GROUP BY groupfield \
ORDER BY orderfield \
LIMIT limitcount OFFSET offsetcount
The different elements within the source statement affect how a view is written in the following ways:
- SELECT fieldlist
The field list within the SQL statement affects either the corresponding key or value within the map() function, depending on whether you are also selecting or reducing your data.
- FROM table
There are no table compartments within Couchbase Server and you cannot perform views across more than one bucket boundary. However, if you are using a type field within your documents to identify different record types, then you may want to use the map() function to make a selection.
- WHERE condition
The map() function and the data generated into the view key directly affect how you can query, and therefore how selection of records takes place.
- ORDER BY orderfield
The order of record output within a view is directly controlled by the key specified during the map() function phase of the view generation.
- LIMIT limitcount OFFSET offsetcount
There are a number of different paging strategies available within the map/reduce and views mechanism.
- GROUP BY groupfield
Grouping within SQL is handled within views through the use of the reduce() function.
The interaction between the view map() function, reduce() function, selection parameters and other miscellaneous parameters according to the table below:
SQL Statement Fragment | View Key | View Value | map() Function | reduce() Function | Selection Parameters | Other Parameters |
---|---|---|---|---|---|---|
SELECT fields | Yes | Yes | Yes | No: with GROUP BY and SUM() or COUNT() functions only | No | No |
FROM table | No | No | Yes | No | No | No |
WHERE clause | Yes | No | Yes | No | Yes | No |
ORDER BY field | Yes | No | Yes | No | No | descending |
LIMIT x OFFSET y | No | No | No | No | No | limit, skip |
GROUP BY field | Yes | Yes | Yes | Yes | No | No |
Within SQL, the basic query structure can be used for a multitude of different queries. For example, the same ’ SELECT fieldlist FROM table WHERE xxxx can be used with a number of different clauses.
Within map/reduce and Couchbase Server, multiple views may be needed to be created to handled different query types. For example, performing a query on all the blog posts on a specific date will need a very different view definition than one needed to support selection by the author.
Translating SQL SELECT to map/reduce
The field selection within an SQL query can be translated into a corresponding view definition, either by adding the fields to the emitted key (if the value is also used for selection in a WHERE clause), or into the emitted value, if the data is separate from the required query parameters.
For example, to get the sales data by country from each stored document using the following map() function:
function(doc, meta) {
emit([doc.city, doc.sales], null);
}
If you want to output information that can be used within a reduce function, this should be specified in the value generated by each emit() call. For example, to reduce the sales figures the above map() function could be rewritten as:
function(doc, meta) {
emit(doc.city, doc.sales);
}
In essence this does not produce significantly different output (albeit with a simplified key), but the information can now be reduced using the numerical value.
If you want to output data or field values completely separate to the query values, then these fields can be explicitly output within the value portion of the view. For example:
function(doc, meta) {
emit(doc.city, [doc.name, doc.sales]);
}
If the entire document for each item is required, load the document data after the view has been requested through the client library. For more information on this parameter and the performance impact.
Within a SELECT statement it is common practice to include the primary key for a given record in the output. Within a view this is not normally required, since the document ID that generated each row is always included within the view output.
Translating SQL WHERE to map/reduce
The WHERE clause within an SQL statement forms the selection criteria for choosing individual records. Within a view, the ability to query the data is controlled by the content and structure of the key generated by the map() function.
In general, for each WHERE clause you need to include the corresponding field in the key of the generated view, and then use the key, keys or startkey / endkey combinations to indicate the data you want to select.. The complexity occurs when you need to perform queries on multiple fields. There are a number of different strategies that you can use for this.
The simplest way is to decide whether you want to be able to select a specific combination, or whether you want to perform range or multiple selections. For example, using our recipe database, if you want to select recipes that use the ingredient ‘carrot’ and have a cooking time of exactly 20 minutes, then you can specify these two fields in the map() function:
function(doc, meta)
{
if (doc.ingredients)
{
for(i=0; i < doc.ingredients.length; i++)
{
emit([doc.ingredients[i].ingredient, doc.totaltime], null);
}
}
}
Then the query is an array of the two selection values:
?key=["carrot",20]
This is equivalent to the SQL query:
SELECT recipeid FROM recipe JOIN ingredients on ingredients.recipeid = recipe.recipeid
WHERE ingredient = 'carrot' AND totaltime = 20
If, however, you want to perform a query that selects recipes containing carrots that can be prepared in less than 20 minutes, a range query is possible with the same map() function:
?startkey=["carrot",0]&endkey=["carrot",20]
This works because of the sorting mechanism in a view, which outputs in the information sequentially, fortunately nicely sorted with carrots first and a sequential number.
More complex queries though are more difficult. What if you want to select recipes with carrots and rice, still preparable in under 20 minutes?
A standard map() function like that above wont work. A range query on both ingredients will list all the ingredients between the two. There are a number of solutions available to you. First, the easiest way to handle the timing selection is to create a view that explicitly selects recipes prepared within the specified time. I.E:
function(doc, meta)
{
if (doc.totaltime <= 20)
{
...
}
}
Although this approach seems to severely limit your queries, remember you can create multiple views, so you could create one for 10 mins, one for 20, one for 30, or whatever intervals you select. It’s unlikely that anyone will really want to select recipes that can be prepared in 17 minutes, so such granular selection is overkill.
The multiple ingredients is more difficult to solve. One way is to use the client to perform two queries and merge the data. For example, the map() function:
function(doc, meta)
{
if (doc.totaltime && doc.totaltime <= 20)
{
if (doc.ingredients)
{
for(i=0; i < doc.ingredients.length; i++)
{
emit(doc.ingredients[i].ingredient, null);
}
}
}
}
Two queries, one for each ingredient can easily be merged by performing a comparison and count on the document ID output by each view.
The alternative is to output the ingredients twice within a nested loop, like this:
function(doc, meta)
{
if (doc.totaltime && doc.totaltime <= 20)
{
if (doc.ingredients)
{
for (i=0; i < doc.ingredients.length; i++)
{
for (j=0; j < doc.ingredients.length; j++)
{
emit([doc.ingredients[i].ingredient, doc.ingredients[j].ingredient], null);
}
}
}
}
}
Now you can perform an explicit query on both ingredients:
?key=["carrot","rice"]
If you really want to support flexible cooking times, then you can also add the cooking time:
function(doc, meta)
{
if (doc.ingredients)
{
for (i=0; i < doc.ingredients.length; i++)
{
for (j=0; j < doc.ingredients.length; j++)
{
emit([doc.ingredients[i].ingredient, doc.ingredients[j].ingredient, recipe.totaltime], null);
}
}
}
}
And now you can support a ranged query on the cooking time with the two ingredient selection:
?startkey=["carrot","rice",0]&key=["carrot","rice",20]
This would be equivalent to:
SELECT recipeid FROM recipe JOIN ingredients on ingredients.recipeid = recipe.recipeid
WHERE (ingredient = 'carrot' OR ingredient = 'rice') AND totaltime = 20
Translating SQL ORDER BY to map/reduce
The ORDER BY clause within SQL controls the order of the records that are output. Ordering within a view is controlled by the value of the key. However, the key also controls and supports the querying mechanism.
In SELECT statements where there is no explicit WHERE clause, the emitted key can entirely support the sorting you want. For example, to sort by the city and salesman name, the following map() will achieve the required sorting:
function(doc, meta)
{
emit([doc.city, doc.name], null)
}
If you need to query on a value, and that query specification is part of the order sequence then you can use the format above. For example, if the query basis is city, then you can extract all the records for ‘London’ using the above view and a suitable range query:
?endkey=["London\u0fff"]&startkey=["London"]
However, if you want to query the view by the salesman name, you need to reverse the field order in the emit() statement:
function(doc, meta)
{
emit([doc.name,doc.city],null)
}
Now you can search for a name while still getting the information in city order.
The order the output can be reversed (equivalent to ORDER BY field DESC ) by using the descending query parameter.
Translating SQL GROUP BY to map/reduce
The GROUP BY parameter within SQL provides summary information for a group of matching records according to the specified fields, often for use with a numeric field for a sum or total value, or count operation.
For example:
SELECT name,city,SUM(sales) FROM sales GROUP BY name,city
This query groups the information by the two fields ‘name’ and ‘city’ and produces a sum total of these values. To translate this into a map/reduce function within Couchbase Server:
From the list of selected fields, identify the field used for the calculation. These will need to be exposed within the value emitted by the map() function.
Identify the list of fields in the GROUP BY clause. These will need to be output within the key of the map() function.
Identify the grouping function, for example SUM() or COUNT(). You will need to use the equivalent built-in function, or a custom function, within the reduce() function of the view.
For example, in the above case, the corresponding map function can be written as map() :
function(doc, meta)
{
emit([doc.name,doc.city],doc.sales);
}
This outputs the name and city as the key, and the sales as the value. Because the SUM() function is used, the built-in reduce() function _sum can be used.
An example of this map/reduce combination can be seen _sum.
More complex grouping operations may require a custom reduce function.
Translating SQL LIMIT and OFFSET
Within SQL, the LIMIT and OFFSET clauses to a given query are used as a paging mechanism. For example, you might use:
SELECT recipeid,title FROM recipes LIMIT 100
To get the first 100 rows from the database, and then use the OFFSET to get the subsequent groups of records:
SELECT recipeid,title FROM recipes LIMIT 100 OFFSET 100
With Couchbase Server, the limit and skip parameters when supplied to the query provide the same basic functionality:
?limit=100&skip=100
Performance for high values of skip can be affected.