MongoDB Lists - Get Every Nth Element

I have a Mongodb schema that looks something like this:

[ { "name" : "name1", "instances" : [ { "value" : 1, "date" : ISODate("2015-03-04T00:00:00.000Z") }, { "value" : 2, "date" : ISODate("2015-04-01T00:00:00.000Z") }, { "value" : 2.5, "date" : ISODate("2015-03-05T00:00:00.000Z") }, ... ] }, { "name" : "name2", "instances" : [ ... ] } ] 

where the number of instances for each element can be quite large.

Sometimes I want to get only a sample of data, i.e. get every third copy or every 10th copy ... you get an image.

I can achieve this by getting all the instances and filtering them in my server code, but I was wondering if there is a way to do this using some aggregation request.

Any ideas?


Update

Assuming the data structure was flat, as suggested below by @SylvainLeroux, i.e.:

 [ {"name": "name1", "value": 1, "date": ISODate("2015-03-04T00:00:00.000Z")}, {"name": "name2", "value": 5, "date": ISODate("2015-04-04T00:00:00.000Z")}, {"name": "name1", "value": 2, "date": ISODate("2015-04-01T00:00:00.000Z")}, {"name": "name1", "value": 2.5, "date": ISODate("2015-03-05T00:00:00.000Z")}, ... ] 

will it be easier to get every Nth element (of a specific name )?

+15
mongodb mongodb-query aggregation-framework
source share
6 answers

You might like this approach using the $lookup aggregation. And probably the most convenient and fastest way without any trick of aggregation.

Create a collection of Names with the following scheme

 [ { "_id": 1, "name": "name1" }, { "_id": 2, "name": "name2" } ] 

and then a collection of Instances having a parent id like "nameId"

 [ { "nameId": 1, "value" : 1, "date" : ISODate("2015-03-04T00:00:00.000Z") }, { "nameId": 1, "value" : 2, "date" : ISODate("2015-04-01T00:00:00.000Z") }, { "nameId": 1, "value" : 3, "date" : ISODate("2015-03-05T00:00:00.000Z") }, { "nameId": 2, "value" : 7, "date" : ISODate("2015-03-04T00:00:00.000Z") }, { "nameId": 2, "value" : 8, "date" : ISODate("2015-04-01T00:00:00.000Z") }, { "nameId": 2, "value" : 4, "date" : ISODate("2015-03-05T00:00:00.000Z") } ] 

Now with the syntax of $lookup aggregation 3.6, you can use $sample inside the pipeline $lookup to randomly retrieve every Nth element.

 db.Names.aggregate([ { "$lookup": { "from": Instances.collection.name, "let": { "nameId": "$_id" }, "pipeline": [ { "$match": { "$expr": { "$eq": ["$nameId", "$$nameId"] }}}, { "$sample": { "size": N }} ], "as": "instances" }} ]) 

You can check it out here.

+3
source share

It seems that your question is clearly asked "get every nth instance", which seems like a pretty clear question.

Query operations such as .find() can actually return the document “as is,” with the exception of the general “highlight” field in the projection, and operators such as the positional $ match or $elemMatch that allow a single matched array element.

Of course, there is $slice , but it only allows "range selection" in the array, so it does not apply again.

The "only" things that can change the result on the server are .aggregate() and .mapReduce() . The first one doesn't play very well with slicing arrays, at least with the help of "n" elements. However, since the "function ()" arguments of mapReduce are based on JavaScript logic, you have a bit more room to play.

For analytical processes and for analytical purposes only, then simply filter the contents of the array using mapReduce using .filter() :

 db.collection.mapReduce( function() { var id = this._id; delete this._id; // filter the content of "instances" to every 3rd item only this.instances = this.instances.filter(function(el,idx) { return ((idx+1) % 3) == 0; }); emit(id,this); }, function() {}, { "out": { "inline": 1 } } // or output to collection as required ) 

This is actually just a "JavaScript Runner", but if it's just for analysis / analysis, then there is nothing wrong with this concept. Of course, the output is not "exactly" how your document is structured, but it is as close to a facsimile as mapReduce can get.

Another suggestion that I see here requires creating a new collection with all the "denormalized" elements and inserting an "index" from the array as part of the unique _id _id. This can lead to the fact that you can request directly, for example, "every n-th element", which you still have to do:

 db.resultCollection.find({ "_id.index": { "$in": [2,5,8,11,14] } // and so on .... }) 

So practice and specify the index value of "every n-th element" to get "every n-th element". So it doesn't seem to solve the problem that was asked.

If the output form seems more desirable for your "testing" purposes, then the best subsequent request for these results would be to use the aggregation pipeline with $redact

 db.newCollection([ { "$redact": { "$cond": { "if": { "$eq": [ { "$mod": [ { "$add": [ "$_id.index", 1] }, 3 ] }, 0 ] }, "then": "$$KEEP", "else": "$$PRUNE" } }} ]) 

This at least uses a “logical condition”, almost the same as that used with .filter() before, just to select the elements of the nth index without listing all possible index values ​​as a query argument.

+5
source share

Unfortunately, this is not possible with the aggregation structure, since this would require an option with $unwind to emit the index / position of the array from which aggregation cannot be processed at present. For this, there is an open JIRA ticket SERVER-4588 .

However, a workaround would be to use MapReduce , but this is associated with tremendous performance, since the actual calculation of getting the index of the array is done using the built-in JavaScript mechanism (which works slowly), and there is still a single global JavaScript lock that only allows one JavaScript thread to run once.

With mapReduce, you can try something like this:

Display function:

 var map = function(){ for(var i=0; i < this.instances.length; i++){ emit( { "_id": this._id, "index": i }, { "index": i, "value": this.instances[i] } ); } }; 

Decrease function:

 var reduce = function(){} 

Then you can run the following mapReduce function in your collection:

 db.collection.mapReduce( map, reduce, { out : "resultCollection" } ); 

And then you can request a collection of results in the geta list / array of each Nth element of the array of instances using map() :

 var thirdInstances = db.resultCollection.find({"_id.index": N}) .map(function(doc){return doc.value.value}) 
+3
source share

No need for $unwind here. You can use $push with $arrayElemAt to project the value of the array at the requested index inside the $group aggregation.

Something like

 db.colname.aggregate( [ {"$group":{ "_id":null, "valuesatNthindex":{"$push":{"$arrayElemAt":["$instances",N]} }} }, {"$project":{"valuesatNthindex":1}} ]) 
+3
source share

You can use below aggregation:

 db.col.aggregate([ { $project: { instances: { $map: { input: { $range: [ 0, { $size: "$instances" }, N ] }, as: "index", in: { $arrayElemAt: [ "$instances", "$$index" ] } } } } } ]) 

$ range generates a list of indices. The third parameter represents a nonzero step . For N = 2 it will be [0,2,4,6...] , for N = 3 it will return [0,3,6,9...] and so on. Then you can use $ map to get the corresponding elements from the instances array.

+3
source share

Or with just a search box:

 db.Collection.find({}).then(function(data) { var ret = []; for (var i = 0, len = data.length; i < len; i++) { if (i % 3 === 0 ) { ret.push(data[i]); } } return ret; }); 

It returns a promise, which you can then call () to receive data modulo N.

+2
source share

All Articles