Mongodb - date index not used

Collection events have a userId and an array of events - each element in the array is an embedded document. Example:

{ "_id" : ObjectId("4f8f48cf5f0d23945a4068ca"), "events" : [ { "eventType" : "profile-updated", "eventId" : "247266", "eventDate" : ISODate("1938-04-27T23:05:51.451Z"), }, { "eventType" : "login", "eventId" : "64531", "eventDate" : ISODate("1948-05-15T23:11:37.413Z"), } ], "userId" : "junit-19568842", 

}

Using a query similar to the one below, you need to determine the events generated in the last 30 days:

 db.events.find( { events : { $elemMatch: { "eventId" : 201, "eventDate" : {$gt : new Date(1231657163876) } } } } ).explain() 

The query plan shows that the index in "events.eventDate" is used when the test data contains fewer events (about 20):

 { "cursor" : "BtreeCursor events.eventDate_1", "nscanned" : 0, "nscannedObjects" : 0, "n" : 0, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "events.eventDate" : [ [ ISODate("2009-01-11T06:59:23.876Z"), ISODate("292278995-01--2147483647T07:12:56.808Z") ] ] } 

}

However, when there are a large number of events (about 500), the index is not used:

 { "cursor" : "BasicCursor", "nscanned" : 4, "nscannedObjects" : 4, "n" : 0, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } 

}

Why is the index not used when there are many events? Maybe when there are a large number of events, MongoDB believes that it is effective to scan all elements except using the index?

+7
source share
2 answers
The MongoDB query optimizer works in a special way. Instead of calculating the cost of a particular query plan, it simply launches all available plans. No matter what is returned first, it is considered optimal and will be used in the future.

The application is growing, data is growing and changing, the optimal plan may not be optimal at some point. Thus, mongo repeats this process of selecting a query each time.

It seems that in this particular case, the main scan was the most effective.

Link: http://www.mongodb.org/display/DOCS/Query+Optimizer

+11
source

Using $ hint to force the use of the events.eventDate index, nscannedObjects is larger than without the index.

Pseudocode when using the index:

 for(all entries in index matching the criteria) { get user object and scan to see if the eventId criteria is met } 

all entries in the index that meet the criteria → each event is an entry in the index. Thus, the number of entries in the index will be greater than the number of users. Say there are 4 user objects and a total of 7 events matching the criteria, the user object is checked 7 times (the cycle runs 7 times). When the index is not scanned, all 4 user objects are checked only once. Thus, using the index, the number of queries of the user object is greater than when using the index. Is this understanding correct?

 db.events.find( { events : { $elemMatch: { "eventId" : 201, "eventDate" : {$gt : new Date(1231657163876) } } } } ) ._addSpecial("$hint",{"events.eventDate":1}).explain() { "cursor" : "BasicCursor", "nscanned" : 7, "nscannedObjects" : 7, "n" : 0, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } 
+2
source

All Articles