Query in MongoDB map reduction function

I broadcast and saved about 250 thousand tweets in MongoDB, and here I extract it, as you can see, based on the word or keyword present in the tweet.

Mongo mongo = new Mongo("localhost", 27017); DB db = mongo.getDB("TwitterData"); DBCollection collection = db.getCollection("publicTweets"); BasicDBObject fields = new BasicDBObject().append("tweet", 1).append("_id", 0); BasicDBObject query = new BasicDBObject("tweet", new BasicDBObject("$regex", "autobiography")); DBCursor cur=collection.find(query,fields); 

What I would like to do is use Map-Reduce and base it on a keyword, classify it and pass it reduction functions to count the number of tweets under each category, like what you see here . In this example, it counts the number of pages since it is a prime number. I want to do something like:

 "if (this.tweet.contains("kword1")) "+ "category = 'kword1 tweets'; " + "else if (this.tweet.contains("kword2")) " + "category = 'kword2 tweets'; 

and then use the decrease function to get a counter, as in the example program.

I know the syntax is incorrect, but that is pretty much what I would like to do. Is there any way to achieve it? Thanks!

PS: Oh, and I'm coding in Java. Therefore, Java syntax will be highly appreciated. Thanks!

The result of the submitted code looks something like this:

 { "tweet" : "An autobiography is a book that reveals nothing bad about its writer except his memory."} { "tweet" : "I refuse to read anything that not real the only thing I've read since biff books is Jordan autobiography #lol"} { "tweet" : "well we've had the 2012 publication of Ashley Good Books, I predict 2013 will be seeing an autobiography ;)"} 

This, of course, is for all tweets with the word "autobiography." I would like to use this as a map function, classify it as an “autobiographical tweet” (and other keywords), and then send it a reduction function to count everything and return the number of tweets with the word in it.

Sort of:

 {"_id" : "Autobiography Tweets" , "value" : { "publicTweets" : 3.0}} {"_id" : "Biography Tweets" , "value" : { "publicTweets" : 15.0}} 
+1
source share
2 answers

You might want to try the following:

  String map = "function() { " + " var regex1 = new RegExp('autobiography', 'i'); " + " var regex2 = new RegExp('book', 'i'); " + " if (regex1.test(this.tweet) ) " + " emit('Autobiography Tweet', 1); " + " else if (regex2.test(this.tweet) ) " + " emit('Book Tweet', 1); " + " else " + " emit('Uncategorized Tweet', 1); " + "}"; String reduce = "function(key, values) { " + " return Array.sum(values); " + "}"; MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce, null, MapReduceCommand.OutputType.INLINE, null); MapReduceOutput out = collection.mapReduce(cmd); try { for (DBObject o : out.results()) { System.out.println(o.toString()); } } catch (Exception e) { e.printStackTrace(); } 
+6
source

Although you have already accepted the answer from Kay, and this one is likely to be ignored, I would like to suggest an alternative solution.

The MongoDB documentation has an article on how to perform a full-text search in Mongo . In order to quickly search for text fields for individual words, they suggest preparing documents by breaking text fields into arrays of individual words, storing these arrays in documents together with the full text and creating an index for this array.

Subsequently, you can very quickly find all documents containing a certain word, because your search query can 1. use an index and 2. do not need to use a regular expression (which can be very expensive).

+5
source

All Articles