Using $ regex in the mongodb aggregation structure in $ group

Consider the following example:

db.article.aggregate( { $group : { _id : "$author", docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} ); 

It groups by author field and calculates two fields.

I have values ​​for $ author = FirstName_LastName. Now, instead of grouping by $ author, I want to group all authors that have the same LastName.

I tried $ regex to group by all relevant lines after '_'

 $author.match(/_[a-zA-Z0-9]+$/) db.article.aggregate( { $group : { _id : "$author".match(/_[a-zA-Z0-9]+$/), docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} ); also tried the following: db.article.aggregate( { $group : { _id : {$author: {$regex: /_[a-zA-Z0-9]+$/}}, docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} ); 
+8
regex mongodb aggregation-framework
source share
3 answers

Actually there is no such method that would provide such functionality, or I could not find the corresponding version that contains it. This will not work with $ regexp, I think: http://docs.mongodb.org/manual/reference/operator/regex/ this is just for pattern matching.

There is an improvement request in jira: https://jira.mongodb.org/browse/SERVER-6773

He is in an open state. BUT

on github I found this discourse: https://github.com/mongodb/mongo/pull/336

And if you check this commit: https://github.com/nleite/mongo/commit/2dd175a5acda86aaad61f5eb9dab83ee19915709

it contains more or less accurately the method that you have. I do not quite understand the state of this improvement: in 2.2.3 it does not work.

+6
source share

Using mapReduce: This is a general form of aggregation. Here's how to do it in the mongo shell: Define a display function

 var mapFunction = function() { var key = this.author.match(/_[a-zA-Z0-9]+$/)[0]; var nb_match_bar2 = 0; if( this.bar.match(/bar2/g) ){ nb_match_bar2 = 1; } var value = { docsPerAuthor: 1, viewsPerAuthor: Array.sum(this.pageViews) }; emit( key, value ); }; 

and reduction function

 var reduceFunction = function(key, values) { var reducedObject = { _id: key, docsPerAuthor: 0, viewsPerAuthor: 0 }; values.forEach( function(value) { reducedObject.docsPerAuthor += value.docsPerAuthor; reducedObject.viewsPerAuthor += value.viewsPerAuthor; } ); return reducedObject; }; 

execute mapReduce and save the result in map_reduce_result

 >db.st.mapReduce(mapFunction, reduceFunction, {out:'map_reduce_result'}) 

query map_reduce_result to get the result

 >db.map_reduce_result.find() 
+4
source share

A possible workaround with an aggregation database is to use $ project to calculate the author name. However, it is dirty, since you need to manually scroll through different sizes of names:

Here we compute the field name as a substring after the "_" character, trying every possible position (which is why there is a chain from $ cond) and is discarded when only $ author is returned if the first name is too long:

http://mongotry.herokuapp.com/#?bookmarkId=52fb5f24a0378802003b4c68

 [ { "$project": { "author": 1, "pageViews": 1, "name": { "$cond": [ { "$eq": [ { "$substr": [ "$author", 0, 1 ] }, "_" ] }, { "$substr": [ "$author", 1, 999 ] }, { "$cond": [ { "$eq": [ { "$substr": [ "$author", 1, 1 ] }, "_" ] }, { "$substr": [ "$author", 2, 999 ] }, { "$cond": [ { "$eq": [ { "$substr": [ "$author", 2, 1 ] }, "_" ] }, { "$substr": [ "$author", 3, 999 ] }, { "$cond": [ { "$eq": [ { "$substr": [ "$author", 3, 1 ] }, "_" ] }, { "$substr": [ "$author", 4, 999 ] }, { "$cond": [ { "$eq": [ { "$substr": [ "$author", 4, 1 ] }, "_" ] }, { "$substr": [ "$author", 5, 999 ] }, "$author" ] } ] } ] } ] } ] } } }, { "$group": { "_id": "$name", "viewsPerAuthor": { "$sum": "$pageViews" } } } ] 
+3
source share

All Articles