I broadcast and saved about 250 thousand tweets in MongoDB, and here I extract it, as you can see, based on the word or keyword present in the tweet.
Mongo mongo = new Mongo("localhost", 27017); DB db = mongo.getDB("TwitterData"); DBCollection collection = db.getCollection("publicTweets"); BasicDBObject fields = new BasicDBObject().append("tweet", 1).append("_id", 0); BasicDBObject query = new BasicDBObject("tweet", new BasicDBObject("$regex", "autobiography")); DBCursor cur=collection.find(query,fields);
What I would like to do is use Map-Reduce and base it on a keyword, classify it and pass it reduction functions to count the number of tweets under each category, like what you see here . In this example, it counts the number of pages since it is a prime number. I want to do something like:
"if (this.tweet.contains("kword1")) "+ "category = 'kword1 tweets'; " + "else if (this.tweet.contains("kword2")) " + "category = 'kword2 tweets';
and then use the decrease function to get a counter, as in the example program.
I know the syntax is incorrect, but that is pretty much what I would like to do. Is there any way to achieve it? Thanks!
PS: Oh, and I'm coding in Java. Therefore, Java syntax will be highly appreciated. Thanks!
The result of the submitted code looks something like this:
{ "tweet" : "An autobiography is a book that reveals nothing bad about its writer except his memory."} { "tweet" : "I refuse to read anything that not real the only thing I've read since biff books is Jordan autobiography #lol"} { "tweet" : "well we've had the 2012 publication of Ashley Good Books, I predict 2013 will be seeing an autobiography ;)"}
This, of course, is for all tweets with the word "autobiography." I would like to use this as a map function, classify it as an “autobiographical tweet” (and other keywords), and then send it a reduction function to count everything and return the number of tweets with the word in it.
Sort of:
{"_id" : "Autobiography Tweets" , "value" : { "publicTweets" : 3.0}} {"_id" : "Biography Tweets" , "value" : { "publicTweets" : 15.0}}