Let's say I have a collection with documents that look like this (just a simplified example, but it should show a diagram):
> db.data.find() { "_id" : ObjectId("4e9c1f27aa3dd60ee98282cf"), "type" : "A", "value" : 11 } { "_id" : ObjectId("4e9c1f33aa3dd60ee98282d0"), "type" : "A", "value" : 58 } { "_id" : ObjectId("4e9c1f40aa3dd60ee98282d1"), "type" : "B", "value" : 37 } { "_id" : ObjectId("4e9c1f50aa3dd60ee98282d2"), "type" : "B", "value" : 1 } { "_id" : ObjectId("4e9c1f56aa3dd60ee98282d3"), "type" : "A", "value" : 85 } { "_id" : ObjectId("4e9c1f5daa3dd60ee98282d4"), "type" : "B", "value" : 12 }
Now I need to collect some statistics on this collection. For example:
db.data.mapReduce(function(){ emit(this.type,this.value); },function(key,values){ var total = 0; for(i in values) {total+=values[i]}; return total; }, {out:'stat'})
will collect totals in the stat collection.
> db.stat.find() { "_id" : "A", "value" : 154 } { "_id" : "B", "value" : 50 }
Everything is perfect at this point, but I'm stuck on the next move:
- Collection
- 'data' is constantly updated with new data (old documents remain unchanged, only inserted, not updated)
- I would like to periodically update the "stat" collection, but I do not want to request the entire data collection every time, so I choose to run incremental mapReduce
- It might seem that just updating the "stat" collection for each insert in the data collection and not using mapReduce, but the real case is more complicated than this example, and I would like to get statistics only on demand.
- To do this, I should be able to request only documents that have been added since my last mapReduce
- As far as I understand, I can not rely on the ObjectId property, just save the last one, and then select every document with ObjectId> that is stored, because ObjectId is not an autoincrement identifier in SQL databases (for example, different fragments will create different ObjectIds) .
- I can change the ObjectId generator, but not sure how to do it better in a closed environment.
So the question is:
Is it possible to select only documents added after the last mapReduce to run incremental mapReduce, or can there be another strategy for updating statistics on an ever-growing collection?
mongodb mapreduce
Hitosu
source share