How to combine an array field in a document in Mongo aggregation

Question

How to combine an array field in a document in Mongo aggregation

I have one requirement when I need to do aggregation on two records that have an array field with a different value. What I need when I am performing aggregation on these records, the result should have one array with unique values from different arrays. Here is an example:

First recording

{ Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }

Second record

 { Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }

After aggregation on host and artid, I need the result as follows:

  { Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}

I tried $addToset in the group expression, but it gives me the same tags: [["tag1","tag2"],["tag2","tag3"]]

Could you help me how can I achieve this in aggregation

+8

mongodb mongodb-query aggregation-framework

viren Nov 13 '14 at 8:56

source share

1 answer

Neil lunn · Accepted Answer · 2014-11-13T09:08:51+0000

TL; DR;

In modern releases, you should use $reduce with $setUnion after the original $group , as shown:

 db.collection.aggregate([ { "$group": { "_id": { "Host": "$Host", "ArtId": "$ArtId" }, "count": { "$sum": 1 }, "tags": { "$addToSet": "$tags" } }}, { "$addFields": { "tags": { "$reduce": { "input": "$tags", "initialValue": [], "in": { "$setUnion": [ "$$value", "$$this" ] } } } }} ])

You were right in looking for the $addToSet operator, but when working with content in an array, you usually need to process $unwind . This “de-normalizes” the array entries and essentially makes a “copy” of the parent document with each array entry as an exceptional value in the field. This is what you need to avoid the behavior that you see without using it.

Your "account" presents an interesting problem, although it can be easily resolved with a "double unwind" after the initial $group :

 db.collection.aggregate([ // Group on the compound key and get the occurrences first { "$group": { "_id": { "Host": "$Host", "ArtId": "$ArtId" }, "tcount": { "$sum": 1 }, "ttags": { "$push": "$tags" } }}, // Unwind twice because "ttags" is now an array of arrays { "$unwind": "$ttags" }, { "$unwind": "$ttags" }, // Now use $addToSet to get the distinct values { "$group": { "_id": "$_id", "tcount": { "$first": "$tcount" }, "tags": { "$addToSet": "$ttags" } }}, // Optionally $project to get the fields out of the _id key { "$project": { "_id": 0, "Host": "$_id.Host", "ArtId": "$_id.ArtId", "count": "$tcount", "tags": "$ttags" }} ])

This final bit with $project also exists because I used "temporary" names for each of the fields in the other stages of the aggregation pipeline. This is due to the fact that in $project , which "copies" the fields from the existing stage in the order they already appeared "before any" new "fields are added to the document.

Otherwise, the output will look like this:

 { "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }

If the fields are not in the order in which you think. Trivial indeed, but it is important for some people, so it’s worth explaining why and how to handle it.

So $unwind does the work of separating elements, not arrays, and doing $group first allows you to get the "count" of the entry of the "grouping".

The $first operator, which is used later, "saves" the value of "count" because it just got "duplicated" for each value that is present in the "tags" array. It’s all the same value, so it doesn’t matter. Just pick one.

How to combine an array field in a document in Mongo aggregation

TL; DR;

More articles: