How does index sorting work in MongoDB?

Question

How does index sorting work in MongoDB?

I am wondering how index sorting really works in MongoDB. The MongoDB documentation has a couple, but they don’t really describe how sorting or time complexity happens. Searches for SO and interweb as a whole have still not found anything suitable.

Suppose that there are documents in the collection, the find () clause matches b documents, is there a limit c of documents being returned, a → b → c, and c is a certain quantity, large enough such that the returned set cannot fit in memory - let's say 1M documents, for example.

At the beginning of the operation, there are b documents that need to be sorted, and a sorted tree index of size a for the function that the documents will be sorted.

I can imagine:

A) move the index in order, and for each ObjectID object a list of b documents intersects. Returns matches until c is reached. It will be O (ab).

B) like A), but first create an ObjectID hash in the b-documents. It is O (a), but accepts O (b) memory.

I tried to consider sortings based on traversing a lot of b-documents, but it seems that I cannot find anything faster than O (b log b), which is no better than sorting without an index.

I suppose (but maybe I'm wrong) that index scan is not required for each view, since does this view really work?

Update:

Kevin's answer provided a link to narrow down the question a lot, but I would like to confirm / clarify a few points:

As I understand it, you cannot use different indexes for querying and sorting if you want to avoid sorting in memory. When I read this page , it looked as if you (or at least did not indicate one way or another), but this seems to be wrong. Essentially, documents are sorted because they are scanned in index order at the time of the request and therefore returned in index order. Correctly?
When querying a composite index, the sort index must be the first index in the composite index, except for indexes where the query is equal. If not, sorting is done in memory. Correctly?
How does sorting work with $in or $or queries? For example, suppose a query
{a: {$in: [4, 6, 2, 1, 3, 10]}, b: {$gt: 1, $lt: 6}}

... and there is the join index on a and b in that order. How will sorting work when sorting is on a or b ? $or even more complicated because, as I understand it, $or queries are essentially divided into several separate queries. Are $or queries always related to memory in memory, at least to combine the results of individual queries?

+13

sorting time-complexity indexing mongodb

elhefe Mar 21 '16 at 21:50

source share

1 answer

kevinadi · Accepted Answer · 2016-04-05T07:49:39+0000

Indexes in MongoDB are stored in a B-tree structure, where each index entry points to a specific disk space. Using a B-tree structure also means that the MongoDB index is stored in sorted order, always scanned in order, and is cheap for MongoDB to get a series of documents in sorted order through indexes.

Update : The B-tree structure is correct for the MMAPv1 storage engine, but implemented in a slightly different way to the WiredTiger storage engine (by default starting with MongoDB 3.2). The basic idea remains the same when index traversal is cheap in sorted order.

The SORT stage (i.e. sorting in memory) in the request is limited to 32 MB of memory usage. The request will not be executed if the SORT level exceeds this limit. This limit can be circumvented by using the sorted nature of indexes so that MongoDB can return a query with the sort() parameter without sorting in memory.

Suppose the request has the form:

  db.a.find({b:{$gt:100}, c:{$gt:200}}).sort(...)

with a collection having index:

  db.a.createIndex({b:1,c:1})

There are two possible scenarios when the sort() step is specified in the request:

1. MongoDB cannot use the sorted nature of the index and must complete the SORT step in memory .

This is the result if the query cannot use the "index prefix". For example:

  db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({c:1})

In the above query, the index {b:1,c:1} can be used to:

Compliance documents having b greater than 100 for the {b:{$gt:100}} request.
However, there is no guarantee that the returned documents are sorted in accordance with c .

Therefore, MongoDB has no choice but to sort in memory. The explain() output of this request will have a SORT stage. This SORT stage will be limited to 32 MB of memory.

2. MongoDB can use the sorted nature of the index .

This is the result if the query uses:

Sort keys according to index order and
Indicates the same order as the index (that is, the index {b:1,c:1} can be used for sort({b:1,c:1}) or sort({b:-1,c:-1}) , but not for sort({b:1,c:-1}) )

For example:

  db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({b:1})

In the above query, the index {b:1,c:1} can be used to:

Compliance documents that have b greater than 100 for the {b:{$gt:100}} request.
In this case, MongoDB can guarantee that the returned documents are sorted according to b .

The explain() output in the request above will not have a SORT stage. In addition, the output of the explain() request with and without sort() identical. Essentially, we get sort() for free.

A useful resource for understanding this topic is to optimize complex MongoDB indexes . Please note that this post was written back in 2012. Although some terms may be outdated, the technical relevance of the post is still relevant.

Updated follow-up information

MongoDB uses only one index for most queries . So, for example, to avoid a step in the SORT memory in the request
```
 db.a.find({a:1}).sort({b:1}) 
```
the index must simultaneously cover fields a and b ; for example, a composite index such as {a:1,b:1} is required. You cannot have two separate indexes {a:1} and {b:1} and expect the index {a:1} be used for the equality part, and the index {b:1} be used for the sort part. In this case, MongoDB will choose one of two indexes.
Thus, it is true that the results are sorted as they are searched and returned in index order.
To avoid sorting in memory using a composite index, the first part of the index should serve the part of the equality request, and the second part should serve the part of the sort request (as shown in the explanation to (1) above).
If you have a request like this:
```
 db.a.find({}).sort({a:1}) 
```
the index {a:1,b:1} can be used for the sort part (since you basically return the entire collection). And if your request looks like this:
```
 db.a.find({a:1}).sort({b:1}) 
```
the same index {a:1,b:1} can also be used for both parts of the query. Also:
```
 db.a.find({a:1,b:1}) 
```
can also use the same index {a:1,b:1}
Pay attention to the diagram here: the find() parameters, followed by sort() , follow the index order {a:1,b:1} . Therefore, the composite index must be ordered by equality → sorting .

Update on sorting of various types

If the field has different types between documents (for example, if a is a string in one document, a number in others, a logical value in another), how will sorting be done?

The answer is MongoDB BSON type comparison order . To paraphrase a man page, order:

MinKey (inner type)
Null
Numbers (integer, long, double, decimal)
Symbol, String
Object Array
Binata
Objectid
Logical
date
Mark
Regular expression
MaxKey (internal type)

Thus, in the above example, using ascending order, documents containing numbers, then strings, and then a boolean value will first appear.

How does index sorting work in MongoDB?

More articles: