MongoDB - slow performance of "$ group"

I have a MongoDB collection of over 1,000,000 entries. Each record size is about 20 thousand (therefore, the total collection size is about 20 GB).

The collection has a type field that can contain about 10 different values. I would like to get counters for each type for a collection. In addition, the type field has an index.

I tested two different approaches (suppose python syntax):

The naive method is using the "count" calls for each of the values:

for type_val in my_db.my_colc.distinct('type'): counters[type_val] = my_db.my_colc.find({'type' : type_val}).count() 

Using an aggregation structure with the syntax '$ group':

 counters = my_db.my_colc.aggregate([{'$group' : {'_id': '$type', 'agg_val': { '$sum': 1 } }}]) 

The performance that I get for the first approach is about 2 orders of magnitude higher than for the second approach. It seems to be connected with the fact that the counter works only on indexes, without access to documents, while the $ group should iterate over documents one at a time. (This is approximately 1 minute versus 45 minutes).

Is there a way to run an efficient grouping query by type index that will only use the index, thereby achieving performance results from # 1, but using the aggregation structure?

I am using MongoDB 2.6.1

Update: https://jira.mongodb.org/browse/SERVER-11447 is open on this issue in MongoDB Jira.

+7
performance mongodb pymongo
source share
1 answer

in the aggregation pipeline, the $ group clause does not use indexes. It should be used after $ match, which really can use indexes to speed it up.

http://docs.mongodb.org/manual/core/aggregation-pipeline/#aggregation-pipeline-operators-and-performance

amuses

0
source share

All Articles