Illiterate pagination in CouchDB?

Most of the studies I've seen paginated with CouchDB show that you need to make the first ten (or as many) elements from your view, then write the last docid document and pass it to the next page. Unfortunately, I see a few vivid problems with this method.

  • This seems to make it impossible to slip inside the set of pages (if someone jumps directly to page 100, you will have to run queries for pages 2-99 so that you know how to load page 100).
  • This requires that you can pass a lot of status information between your pages.
  • It is hard to code correctly.

Unfortunately, my research showed that when using skip , a significant slowdown develops for data sets of 5,000 records or more, and it will positively cripple once you have achieved something really huge (going to a 20,000 page with 10 records per page will take about 20 seconds - and yes, there are data sets that are large in production). So this is really not an option.

So, what I'm asking is, is there an efficient way to view pages in CatchDB that can get all the elements from an arbitrary page? (I use couchdb-python , but hopefully there is nothing client dependent about this.)

+7
python couchdb pagination
source share
2 answers

I'm new to CouchDB, but I think I can help. I read the following from CouchDB: The Ultimate Guide:

One of the drawbacks of the linked page list style is that ... going to a specific page really doesn't work ... If you really need to go to a page across the entire spectrum of documents ... you can still save the integer value index as an index views and have a hybrid approach when solving pagination.
- http://books.couchdb.org/relax/receipts/pagination

If I read this right, the approach in your case will be as follows:

  • Insert a numerical sequence into your document set.
  • Extract this number sequence into the number representation index.
  • Use arithmetic to calculate the correct numeric keys of the beginning and end for your arbitrary page.

For step 1, you need to add something like "page_seq" as a field for your document. I have no specific recommendation on how you get this number, and I'm curious to know what people think. For this scheme to work, it must increase exactly by 1 for each new record, so there are probably no RDBMS sequences (those I know may skip numbers).

For step 2, you should write a view with a display function, something like this (in Javascript):

 function(doc): emit(doc.page_seq, doc) 

For step 3, you should write your query something like this (assuming the page and page numbering sequences begin with 1):

 results = db.view("name_of_view") page_size = ... # say, 20 page_no = ... # 1 = page 1, 2 = page 2, etc. begin = ((page_no - 1) * page_size) + 1 end = begin + page_size my_page = results[begin:end] 

and then you can iterate through my_page.

A clear drawback to this is that page_seq assumes that you are not filtering the dataset for your view, and you will quickly run into a problem if you try to get this to work with an arbitrary query.

Comments / improvements are welcome.

+3
source share

We solved this problem using CouchDB Lucene for search. 0.6 Snapshot is stable enough, you should try:

repository CouchDB Lucene

+1
source share

All Articles