To have some data to work with, we’re creating a list of bands. A document per band:
{ "name":"Biffy Clyro" }
{ "name":"Foo Fighters" }
{ "name":"Tool" }
{ "name":"Incubus" }
{ "name":"Helmet" }
{ "name":"The Fuckin Hate" }
{ "name":"Future of the Left" }
{ "name":"A Perfect Circle" }
{ "name":"Silverchair" }
{ "name":"Queens of the Stone Age" }
{ "name":"Kerub" }
We need a simple map function that gives us an alphabetical list of band names. This should be easy, but we’re adding extra smarts to filter out "The" and "A" in front of band names to put them into the right position.
function(doc) {
if(doc.name) {
var name = doc.name.replace(/^(A|The) /, "");
emit(name, doc);
}
}
The views result is an alphabetical list of band names. Now say we want to display band names five at a time and have a link pointing to the next five names that make up one page; and a link for the previous five, if we’re not on the first page.
We already learned how to use the startkey, limit, and skip parameters in earlier chapters. We’ll use these again here. First, let’s have a look at the full result set:
{"total_rows":11,"offset":0,"rows":[
{"id":"a0746072bba60a62b01209f467ca4fe2","key":"Biffy Clyro","value":null},
{"id":"b47d82284969f10cd1b6ea460ad62d00","key":"Foo Fighters","value":null},
{"id":"45ccde324611f86ad4932555dea7fce0","key":"Fuckin Hate","value":null},
{"id":"d7ab24bb3489a9010c7d1a2087a4a9e4","key":"Future of the Left","value":null},
{"id":"ad2f85ef87f5a9a65db5b3a75a03cd82","key":"Helmet","value":null},
{"id":"a2f31cfa68118a6ae9d35444fcb1a3cf","key":"Incubus","value":null},
{"id":"67373171d0f626b811bdc34e92e77901","key":"Kerub","value":null},
{"id":"3e1b84630c384f6aef1a5c50a81e4a34","key":"Perfect Circle","value":null},
{"id":"84a371a7b8414237fad1b6aaf68cd16a","key":"Queens of the Stone Age","value":null},
{"id":"dcdaf08242a4be7da1a36e25f4f0b022","key":"Silverchair","value":null},
{"id":"fd590d4ad53771db47b0406054f02243","key":"Tool","value":null}
]}
The mechanics of paging are very simple:
-
Display first page.
-
If there are more rows to show, show next link.
-
Draw subsequent page
-
If this is not the first page, show a previous link.
-
If there are more rows to show, show next link.
Or in a pseudo-JavaScript snippet:
var result = new Result();
var page = result.getPage();
page.display();
if(result.hasPrev()) {
page.display_link('prev');
}
if(result.hasNext()) {
page.display_link('next');
}
Don’t use this method! We just show it because it might seem natural to use this and you do want to know why it is a bad idea. To get the first five rows from our view result, you use the ?limit=5 query parameter:
curl -X GET http://127.0.0.1:5984/artists/_design/artists/_view/by-name?limit=5
The result:
{"total_rows":11,"offset":0,"rows":[
{"id":"a0746072bba60a62b01209f467ca4fe2","key":"Biffy Clyro","value":null},
{"id":"b47d82284969f10cd1b6ea460ad62d00","key":"Foo Fighters","value":null},
{"id":"45ccde324611f86ad4932555dea7fce0","key":"Fuckin Hate","value":null},
{"id":"d7ab24bb3489a9010c7d1a2087a4a9e4","key":"Future of the Left","value":null},
{"id":"ad2f85ef87f5a9a65db5b3a75a03cd82","key":"Helmet","value":null}
]}
By comparing the total_rows value to our limit value, we can determine if there are more pages to display. We also know by the offset member that we are on the first page. We can also calculate the value for skip= to get the results for the next page:
var rows_per_page = 5;
var page = (offset / rows_per_page) + 1; // == 1
var skip = page * rows_per_page; // == 5 for the first page, 10 for the second ...
So we query CouchDB with:
curl -X GET 'http://127.0.0.1:5984/artists/_design/artists/_view/by-name?limit=5&skip=5'
Note we had to use ' (quotes), to escape the & character, that is special to the shell we execute curl in.
The result:
{"total_rows":11,"offset":5,"rows":[
{"id":"a2f31cfa68118a6ae9d35444fcb1a3cf","key":"Incubus","value":null},
{"id":"67373171d0f626b811bdc34e92e77901","key":"Kerub","value":null},
{"id":"3e1b84630c384f6aef1a5c50a81e4a34","key":"Perfect Circle","value":null},
{"id":"84a371a7b8414237fad1b6aaf68cd16a","key":"Queens of the Stone Age","value":null},
{"id":"dcdaf08242a4be7da1a36e25f4f0b022","key":"Silverchair","value":null}
]}
Implementing the hasPrev() and hasNext() methods is pretty straightforward:
function hasPrev()
{
return page > 1;
}
function hasNext()
{
var last_page = Math.floor(total_rows / rows_per_page) +
(total_rows % rows_per_page);
return page != last_page;
}
The Dealbreaker
This all looks easy and straightforward, but it has one fatal flaw. Remember how view results are generated from the underlying b-tree index: CouchDB jumps to the first row (or the first row that matches startkey, if provided) and reads one row after the other from the index until there are no more rows (or limit or endkey match, if provided).
skip works like this: In addition to going to the first row and start reading, skip will skip as many rows as specified, but CouchDB will still read from the first row, it just won’t return any values for the skipped rows. If you specify skip=100, CouchDB will read 100 rows and not create output for them. This doesn’t sound too bad, but it is very bad, when you use 1000 or even 10000 as skip values. CouchDB will have to look at a lot of rows unnecessarily.
As a rule of thumb, skip should only be used with single digit values. While possible that there are legit use-cases where you specify a larger value, they are a good indicator for potential problems with your solution.
Finally, for the calculations to work, you need to add a reduce function and make two calls to the view per page to get all the numbering right and there’s still a potential for error.
The correct solution is not much harder either. Instead of slicing the result set into equally sized pages, we look at ten rows at a time and use startkey to jump to the next ten rows. We even use skip, but only with the value 1.
Here is how it works:
-
Request rows_per_page + 1 rows from the view
-
Display rows_per_page rows, store + 1 row as next_startkey and next_startkey_docid
-
As page information, keep startkey and next_startkey
-
Use the next_* values to create the next link, use the others to create the previous link
The trick to find the next page is pretty simple. Instead of requesting ten rows for a page, you request eleven rows, but only display ten and use the values in the eleventh row as the startkey for the next page. Populating the link to the previous page is as simple as carrying the current startkey over to the next page. If there’s no previous startkey, we are on the first page. We stop displaying the link to the next page if we get rows_per_page or less rows back.
This is called linked list pagination, as we go from page to page, or list item to list item, instead of jumping directly to a pre-computed page. There is one caveat though. Can you spot it?
CouchDB view keys do not have to be unique, you can have multiple index entries red. What if you have more index entries for a key than rows that should be on a page? startkey jumps to the first row and you’d be screwed if CouchDB wouldn’t have an additional parameter for you to use. All view keys with the same value are internally sorted by docid, that is, the id of the document that created that view row. You can use the startkey_docid and endkey_docid parameters to get subsets of these rows. For pagination, we still don’t need endkey_docid, but startkey_docid is very handy. In addition to startkey and limit, you also use startkey_docid for pagination if, and only if, the extra row you fetch to find the next page has the same key as the current startkey.
It is important to note that the *_docid parameters only work in addition to the *key parameters and are only useful to further narrow down the result set of a view for a single key. They do not work on their own (the one exception being the built-in _all_docs view that already sorts by document id).
The advantage of this approach is that all they key operations can be performed on the super fast b-tree index behind the view. Looking up a page doesn’t include scanning through hundreds and thousands of rows unnecessarily.
Jump to Page
One drawback of the linked list style pagination is that because you can’t pre-compute the rows for a particular page from the page number and the rows per page. Jumping to a specific page doesn’t really work. Our gut reaction, if that concern is raised is: “Not even Google is doing that!” and we tend to get away with that. Google always pretends on the first page to find 10 more pages of results. Only if you click on the second page (something very few people actually do), Google might display a reduced set of pages. If you page through the result you get links for the previous and next 10 pages, but no more. Pre-computing the necessary startkey and startkey_docid for 20 pages is a feasible operation and a pragmatic optimization to know the rows for every page in a result set that is potentially tens of thousands of rows long, or more.
If you really do need jump to page over the full range of documents (we have seen applications require that), you can still maintain an integer value index (much like in the ordering lists chapter) as the view index and have a hybrid approach at solving pagination.