TL; DR;
In modern releases, you should use $reduce with $setUnion after the original $group , as shown:
db.collection.aggregate([ { "$group": { "_id": { "Host": "$Host", "ArtId": "$ArtId" }, "count": { "$sum": 1 }, "tags": { "$addToSet": "$tags" } }}, { "$addFields": { "tags": { "$reduce": { "input": "$tags", "initialValue": [], "in": { "$setUnion": [ "$$value", "$$this" ] } } } }} ])
You were right in looking for the $addToSet operator, but when working with content in an array, you usually need to process $unwind . This “de-normalizes” the array entries and essentially makes a “copy” of the parent document with each array entry as an exceptional value in the field. This is what you need to avoid the behavior that you see without using it.
Your "account" presents an interesting problem, although it can be easily resolved with a "double unwind" after the initial $group :
db.collection.aggregate([
This final bit with $project also exists because I used "temporary" names for each of the fields in the other stages of the aggregation pipeline. This is due to the fact that in $project , which "copies" the fields from the existing stage in the order they already appeared "before any" new "fields are added to the document.
Otherwise, the output will look like this:
{ "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }
If the fields are not in the order in which you think. Trivial indeed, but it is important for some people, so it’s worth explaining why and how to handle it.
So $unwind does the work of separating elements, not arrays, and doing $group first allows you to get the "count" of the entry of the "grouping".
The $first operator, which is used later, "saves" the value of "count" because it just got "duplicated" for each value that is present in the "tags" array. It’s all the same value, so it doesn’t matter. Just pick one.