Starting from version MongoDB 3.0, just changing order with
collection.aggregate(...).explain()
to
collection.explain().aggregate(...)
will give you the desired results (documentation here ).
For older versions> = 2.6, you need to use the explain parameter for pipeline aggregation operations
explain:true
db.collection.aggregate([ { $project : { "Tags._id" : 1 }}, { $unwind : "$Tags" }, { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}}, { $group: { _id : "$_id", count: { $sum:1 } }}, {$sort: {"count":-1}} ], { explain:true } )
An important consideration within the aggregation structure is that the index can only be used to retrieve the source data for the pipeline (for example, using $match , $sort , $geonear at the beginning of the pipeline) as the subsequent steps of $lookup and $graphLookup . After the data has been received in the aggregation pipeline for processing (for example, to go through stages such as $project , $unwind and $group ), further manipulation will be performed in memory (possibly using temporary files if the option is set allowDiskUse ).
Pipeline optimization
In general, you can optimize aggregation pipelines:
- Starting the pipeline in increments of
$match to limit the processing of relevant documents. - Providing the initial stages of
$match / $sort supported by an efficient index . - Filtering data early using
$match , $limit and $skip . - Minimizing unnecessary steps and manipulating documents (perhaps revising your scheme if complex aggregation gymnastics is required).
- Use new aggregation operators if you upgraded your MongoDB server. For example, MongoDB 3.4 has added many new steps and aggregation expressions , including support for working with arrays, rows, and faces.
There are also a number of pipeline aggregate optimizations that automatically occur depending on the version of the MongoDB server. For example, adjacent steps can be combined and / or reordered to improve execution without affecting the output.
Limitations
As in MongoDB 3.4, the “Aggregate structure explain ” option provides information on how the pipeline is processed, but does not support the same level of detail as the executionStats mode for find() . If you are focused on optimizing the initial execution of the query, you will most likely find it useful to look at the equivalent find().explain() query using executionStats or allPlansExecution verbosity .
There are several relevant feature requests to monitor / rate in the MongoDB Problem Tracker for more detailed execution statistics to help optimize / summarize aggregation pipelines: