Let's say I have a collection of such users: -
{ "_id" : "1234", "Name" : "John", "OS" : "5.1", "Groups" : [{ "_id" : "A", "Name" : "Group A" }, { "_id" : "C", "Name" : "Group C" }] }
And I have a collection of such events: -
{ "_id" : "15342", "Event" : "VIEW", "UserId" : "1234" }
I can use mapreduce to calculate the number of events for each user, since I can just issue a "UserId" and count it, however, what I want to do is count events by groups.
If there was an array of โGroupsโ in my event document, it would be easy, but I do not do it, and this is just an example, the actual application of this is much more complicated, and I do not want to replicate all this data into the event document.
I see an example http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/ , but I do not see how this applies in this situation, since it aggregates values โโfrom two places. .. all I really want to do is do a search.
In SQL, I would simply JOIN my flattened UserGroup table to the event table and just GROUP BY UserGroup.GroupName
I would be happy with a few passes of mapreduce ... first go to the UserId account in something like {"_id": "1234", "count": 9}, but I'm stuck on the next pass ... how to enable the group id
Some potential approaches that I have reviewed: -
- Include group information in an event document (not possible)
- Find out how to โjoinโ the user collection or view user groups from the map function so that I can also generate the group ID (I donโt know how to do this).
- Figure out how to โjoinโ the event and user collections in the third collection, I can run mapreduce on top
What is possible and what are the benefits / problems with each approach?