Mongodb Explain for Aggregation Structure

Question

Mongodb Explain for Aggregation Structure

Is there an explanation function for the aggregation structure in MongoDB? I do not see it in the documentation.

If not, is there another way to check how the query is executed as part of the aggregation?

I know what you just do

db.collection.find().explain()

But with the aggregation framework, I get an error

 db.collection.aggregate( { $project : { "Tags._id" : 1 }}, { $unwind : "$Tags" }, { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}}, { $group: { _id : { id: "$_id"}, "count": { $sum:1 } } }, { $sort: {"count":-1}} ).explain()

+63

mongodb aggregation-framework

SCB 03 Oct '12 at 4:50

source share

3 answers

Starting with version 2.6.x, mongodb allows users to make an explanation using an aggregation structure .

All you have to do is add an explanation: true

 db.records.aggregate( [ ...your pipeline...], { explain: true } )

Thanks to Rafa, I know that this could be done even in 2.4, but only through runCommand() . But now you can also use the unit.

+25

Salvador Dali Oct 26 '13 at 1:16

source share

The aggregation structure is a set of analytics tools in MongoDB that allows us to run various types of reports or analyze documents in one or more collections. Based on the idea of a pipeline. We take input from the MongoDB collection and transfer documents from this collection through one or more stages, each of which performs different operations on it. Each stage receives input regardless of the stage at which it was created as an output. The inputs and outputs for all stages are a stream of documents. Each stage has a specific job that it performs. He expects a certain form of document and creates a specific result, which in itself is a stream of documents. At the end of the pipeline we get access to the exit.

An individual stage is a data processing unit. Each stage takes an input document stream one at a time, processes each document one at a time, and creates an output document stream. Again, one at a time. At each stage, a set of handles or partitions is provided that we can control to parameterize the scene to perform any task that we are interested in. Thus, the stage performs a general task - some general task and parameterizes the scene for a specific set of documents with which we work. And it is precisely that we would like this stage to be associated with these documents. These settings usually take the form of operators that we can provide that will change fields, perform arithmetic operations, change documents or perform some kind of accumulation task, as well as many other things. Often, this is the case when we want to include the same type of scene several times in the same pipeline.

eg. We may want to perform an initial filter so that we do not have to pass the entire collection to our pipeline. But then, after some additional processing, you want to filter again using a different set of criteria. So, to understand, the pipeline works with the MongoDB collection. They consist of stages, each of which performs a different data processing task and enters documents as output, which should be transferred to the next stage. And finally, at the end of the pipeline, we get an output that we can do something in our application. In many cases, it is necessary to include the same type of scene, several times within a separate pipeline.

+7

xameeramir Sep 15 '16 at 19:29

source share

Stennie · Accepted Answer · 2012-10-03 05:58

Starting from version MongoDB 3.0, just changing order with

 collection.aggregate(...).explain()

to

 collection.explain().aggregate(...)

will give you the desired results (documentation here ).

For older versions> = 2.6, you need to use the explain parameter for pipeline aggregation operations

`explain:true`

 db.collection.aggregate([ { $project : { "Tags._id" : 1 }}, { $unwind : "$Tags" }, { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}}, { $group: { _id : "$_id", count: { $sum:1 } }}, {$sort: {"count":-1}} ], { explain:true } )

An important consideration within the aggregation structure is that the index can only be used to retrieve the source data for the pipeline (for example, using $match , $sort , $geonear at the beginning of the pipeline) as the subsequent steps of $lookup and $graphLookup . After the data has been received in the aggregation pipeline for processing (for example, to go through stages such as $project , $unwind and $group ), further manipulation will be performed in memory (possibly using temporary files if the option is set allowDiskUse ).

Pipeline optimization

In general, you can optimize aggregation pipelines:

Starting the pipeline in increments of $match to limit the processing of relevant documents.
Providing the initial stages of $match / $sort supported by an efficient index .
Filtering data early using $match , $limit and $skip .
Minimizing unnecessary steps and manipulating documents (perhaps revising your scheme if complex aggregation gymnastics is required).
Use new aggregation operators if you upgraded your MongoDB server. For example, MongoDB 3.4 has added many new steps and aggregation expressions , including support for working with arrays, rows, and faces.

There are also a number of pipeline aggregate optimizations that automatically occur depending on the version of the MongoDB server. For example, adjacent steps can be combined and / or reordered to improve execution without affecting the output.

Limitations

As in MongoDB 3.4, the “Aggregate structure explain ” option provides information on how the pipeline is processed, but does not support the same level of detail as the executionStats mode for find() . If you are focused on optimizing the initial execution of the query, you will most likely find it useful to look at the equivalent find().explain() query using executionStats or allPlansExecution verbosity .

There are several relevant feature requests to monitor / rate in the MongoDB Problem Tracker for more detailed execution statistics to help optimize / summarize aggregation pipelines:

Mongodb Explain for Aggregation Structure

explain:true

Pipeline optimization

Limitations

More articles:

`explain:true`