Efficient MongoDB Aggregation Pagination?

For efficiency, the Mongo documentation recommends that the constraint operators immediately follow the sort operators, and in the end it turns out to be somewhat pointless:

collection.find(f).sort(s).limit(l).skip(p) 

I say this is somewhat pointless because it seems to take the first l items and then drop the first p of these l. Since p is usually greater than l, you think that you will not get any results, but in practice you will get the results l.

Aggregation works more as you expected:

 collection.aggregate({$unwind: u}, {$group: g},{$match: f}, {$sort: s}, {$limit: l}, {$skip: p}) 

returns 0 results if p> = l.

 collection.aggregate({$unwind: u}, {$group: g}, {$match: f}, {$sort: s}, {$skip: p}, {$limit: l}) 

works, but the documentation seems to imply that this will not succeed if the match returns a result set that exceeds the working memory. It's true? If so, is there a better way to perform pagination in the result set returned by aggregation?

Source: Comment “Changed in version 2.4” at the end of this page: http://docs.mongodb.org/manual/reference/operator/aggregation/sort/

+6
source share
2 answers

In the MongoDB cursor methods (i.e. when using find() ), for example, limit , sort , skip can be used in any order = = it does not matter. A find() returns the cursor on which the changes are applied. Sorting is always performed to the limit; omissions are performed to the limit. In other words, the order is: sort → skip → limit .

The aggregate structure does not return a DB cursor. Instead, it returns a document with aggregation results. It works by producing intermediate results at each stage of the pipeline, and therefore the order of operations really matters.

I think MongoDB does not support ordering for cursor modifier methods because of how it is implemented internally.

You cannot split pages into the result of the aggregation structure, because there is only one document with the results. You can still split pages into a regular query with skips and restrictions, but it is best to use a range query because of the efficiency of using the index.

UPDATE:

Since v2.6 Mongo aggregation framework returns a cursor instead of a single document. Compare: v2.4 and v2.6 .

+9
source

The documentation seems to imply that this (aggregation) will fail if the match returns a result that is larger than the working memory. It's true?

No. You can, for example, populate a collection that has more physical memory without even using the $match operator. It may be slow, but it should work. No problem if $match returns what is greater than RAM.

Here are the actual pipeline limits.

http://docs.mongodb.org/manual/core/aggregation-pipeline-limits/

The $match statement does not cause memory problems. As stated in the documentation, $group and $sort are common villains. They are cumulative and may require access to the entire set of input data before they can produce any output. If they load too much data into physical memory, they will fail.

If so, is there a better way to perform pagination in the result set returned by aggregation?

I correctly said that you cannot "paginate" (apply $skip and $limit ) to the aggregation result, because it is just a MongoDB document. But you can “paginate” into intermediate results of the aggregation pipeline.

Using $limit in the pipeline will help to keep the result set within 16 MB, the maximum size of a BSON document. Even if the collection grows, you should be safe.

Problems can arise with the help of $group and, especially, $sort . You can create sort indexes to handle them if they really happen. See the documentation on indexing strategies.

http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/

Finally, keep in mind that $skip does not help in performance. On the contrary, they tend to slow down the application, since it forces MongoDB to scan every skipped document to reach the desired point in the collection.

http://docs.mongodb.org/manual/reference/method/cursor.skip/

+1
source

All Articles