I found some map_reduce tutorials, but none of them seem to have a where clause or any other way to exclude documents / records from what is being considered. I am working on a seemingly easy request. I have a main event log file with timestamps, IP addresses, and campaign IDs. I want to get the number of unique users within a given time range for this campaign. It sounds easy!
I built a request object that looks something like this:
{'ts': {'$gt': 1345840456, '$lt': 2345762454}, 'cid': '2636518'}
With this I tried two things: one using different ones, and the other with map_reduce:
Distinct
db.alpha2.find(query).distinct('ip').count()
In the mongo shell, you can put the request as the second parameter of a separate function, and it works there, but I read that you cannot do this in pymongo.
Map_reduce
map = Code("function () {" " emit(this.ip, 1);" "}") reduce = Code("function (key, values) {" " var total = 0;" " for (var i = 0; i < values.length; i++) {" " total += values[i];" " }" " return total;" "}") totaluniqueimp = db.alpha2.map_reduce(map, reduce, "myresults").count();
(I understand that the reduction function does things that I donβt need, I took this from the demo). This works great, but does not use my "where" options. I try this:
totaluniqueimp = db.alpha2.find(query).map_reduce(map, reduce, "myresults").count();`
And I get this error:
AttributeError: 'Cursor' object has no attribute 'map_reduce'
Conclusion
Basically, this is what I am trying to do in mysql:
select count(*) from records where ts<1000 and ts>900 and campaignid=234 group by ipaddress
It seems so simple! How do you do this in mongo?
UPDATE: ANSWER
Based on Dmitry's answer below, I was able to solve (and simplify) my solution (is it as simple as how can I do this?):
#query is an object that was built above this map = Code("function () { emit(this.ip, 1);}") reduce = Code("function (key, values) {return 1;}") totaluniqueimp = collection.map_reduce(map, reduce, "myresults", query=query).count();
Thank you Dmitry!