Mongon triple compound index

If you have a double composite index {a: 1, b: 1}, it makes sense to me that the index will not be used if you request only one b (that is, you cannot "skip" a in your request) . However, the index will be used if you request only one.

However, given the triple composite index {a: 1, b: 1, c: 1}, my explanation team shows that the index is used when querying a and c (that is, you can skip b in your query).

How can Mongo use the abc index to query for ac and how efficient is the index in this case?

Background:

My use case is that sometimes I want to request on a, b, c, and sometimes I want to request on a, c. Now should I create only 1 index for a, b, c or should I create one on a, c and one on a, b, c?

(It makes no sense to create an index for a, c, b, because c is an index with several keys with good selectivity.)

+3
source share
2 answers

bottom line / tl; dr: Index b can be omitted if a and c requested for equality or inequality, but not for sorting by c .

This is a very good question. Unfortunately, I could not find anything that would authoritatively answer this in more detail. I believe that the implementation of such requests has improved in recent years, so I would not trust the old materials on this topic.

This is all quite complicated because it depends on the selectivity of your indices and whether you are asking for equality, inequality and / or sorting, so explain() is your only friend, but here are some things I found:

Caution What is now a mixture of experimental results, reasoning and guessing. Perhaps I am distorting Kyle's analogy too much, and I can even be completely wrong (and no luck, because my test results are not consistent with my reasoning).

It is clear that you can use the index A, which, depending on the selectivity of A, is certainly very useful. Skipping B can be difficult or not. Let it be like the Kyle Cookbook Example :

 French Beef ... Chicken Coq au Vin Roasted Chicken Lamb ... ... 

If you now ask me to find a French dish called "Chateaubriand", I can use the index a , and since I don’t know the ingredient, you have to scan all the dishes in a . On the other hand, I know that the list of dishes in each category is sorted by index c , so I will need to look for lines starting with, say, β€œCha” in each list of ingredients. If there are 50 ingredients, I will need 50 search queries instead of one but it is much better than scanning every French dish!

In my experiments, the number was much less than the number of different values ​​in b : it never exceeded 2. However, I tested it with only one collection, and this is probably due to the selectivity of the b index.

If you asked me to give you an alphabetically sorted list of all French dishes , I would be in trouble . Now the c index is useless, I have to combine the sorting of all these index lists. I will have to scan every item to do this.

This reflects my tests. Here are some simplified results. The original collection has datetimes, ints and strings, but I wanted everything to be simple, so now all ints.

Essentially, there are only two classes of queries: those where nscanned <= 2 * limit , and those that should scan the entire collection (120 thousand documents). The index is {a, b, c} :

 // fast (range query on c while skipping b) > db.Test.find({"a" : 43, "c" : { $lte : 45454 }}); // slow (sorting) > db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "c" : -1}); > db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "b" : -1}); // fast (can sort on c if b included in the query) > db.Test.find({"a" : 43, "b" : 7887, "c" : { $lte : 45454 }}).sort({ "c" : -1}); // fast (older tutorials claim this is slow) > db.Test.find({"a" : {$gte : 43}, "c" : { $lte : 45454 }}); 

Your mileage will be different.

+2
source

You can view the request for A and C as a special case of the request for A (in this case, the index will be used). Using an index is more efficient than loading an entire document.

Suppose you wanted all documents with A between 7 and 13 and C between 5 and 8.

If you only have an index for A: the database can use the index to select documents with A between 7 and 13, but to make sure C is between 5 and 8, it will also need to get the corresponding documents.

If you have an index in A, B, and C: the database can use the index to select documents with A between 7 and 13. Since the values ​​of C are already stored in the index entries, this can determine whether correspondent documents meet C criteria, without having to get these documentation. Therefore, you avoid reading disks with better performance.

+1
source

All Articles