MongoDB select count (different x) in an indexed column - count unique results for large datasets

I looked through several articles and examples and have not yet found an effective way to make this SQL query in MongoDB (where there are millions of documents rows )

First try

(for example, from this almost duplicated question - Mongolian equivalent of SQL SELECT DISTINCT? )

db.myCollection.distinct("myIndexedNonUniqueField").length 

Obviously, I got this error, since my data array is huge

 Thu Aug 02 12:55:24 uncaught exception: distinct failed: { "errmsg" : "exception: distinct too big, 16mb cap", "code" : 10044, "ok" : 0 } 

Second attempt

I decided to try and make a band

 db.myCollection.group({key: {myIndexedNonUniqueField: 1}, initial: {count: 0}, reduce: function (obj, prev) { prev.count++;} } ); 

But I got this error message:

 exception: group() can't handle more than 20000 unique keys 

Third attempt

I haven't tried it yet, but there are a few suggestions that include mapReduce

eg.

AND

There seems to be a porting request to GitHub fixing the .distinct method to mention that it should only return an invoice, but it is still open: https://github.com/mongodb/mongo/pull/34

But at that moment I thought it was worth asking here what is the last on this subject? Should I upgrade to SQL or another NoSQL database for different accounts? or is there an effective way?

Update:

This comment on MongoDB white papers is not encouraging, is this true?

http://www.mongodb.org/display/DOCS/Aggregation#comment-430445808

Update2:

The new aggregation structure seems to be responding to the above comment ... (MongoDB 2.1 / 2.2 and above, the preview is available, not for production)

http://docs.mongodb.org/manual/applications/aggregation/

+74
mongodb
Aug 02 2018-12-12T00:
source share
3 answers

1) The easiest way to do this is through an aggregation structure. This takes two $ group commands: the first is grouped by individual values, the second is all different values

 pipeline = [ { $group: { _id: "$myIndexedNonUniqueField"} }, { $group: { _id: 1, count: { $sum: 1 } } } ]; // // Run the aggregation command // R = db.runCommand( { "aggregate": "myCollection" , "pipeline": pipeline } ); printjson(R); 

2) If you want to do this with Map / Reduce, you can. This is also a two-phase process: at the first stage, we create a new collection with a list of each individual value for the key. In the second, we make an account () in the new collection.

 var SOURCE = db.myCollection; var DEST = db.distinct DEST.drop(); map = function() { emit( this.myIndexedNonUniqueField , {count: 1}); } reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; // count each distinct value for lagniappe }); return {count: count}; }; // // run map/reduce // res = SOURCE.mapReduce( map, reduce, { out: 'distinct', verbose: true } ); print( "distinct count= " + res.counts.output ); print( "distinct count=", DEST.count() ); 

Please note that you cannot return the result of the map / reduce the built-in, as this will potentially exceed the document size limit of 16 MB. You can save the calculation to the collection, and then calculate () the size of the collection or get the number of results from the return value of mapReduce ().

+64
Aug 02 2018-12-12T00:
source share
 db.myCollection.aggregate( {$group : {_id : "$myIndexedNonUniqueField"} }, {$group: {_id:1, count: {$sum : 1 }}}); 

directly to the result:

 db.myCollection.aggregate( {$group : {_id : "$myIndexedNonUniqueField"} }, {$group: {_id:1, count: {$sum : 1 }}}) .result[0].count; 
+36
Mar 04 '13 at 21:32
source share

The following solution worked for me

db.test.distinct ('user'); ["alex", "England", "France", "Australia"]

db.countries.distinct ('country'). Length 4

+3
May 25 '17 at 6:06 a.m.
source share



All Articles