I use the real-time streaming API on Twitter to maintain an active number of specific tracks. For example, I want to track the number of times "apple", "orange" and "pear" on Twitter. I use Mongo to store tweet data, but I have a question about how best to do the counting for each of the tracks I follow.
I will run this query once per second to get closer to the real-time score for each track, so I need to make sure that I am doing this correctly:
Option 1
Run a counter request for a specific track
db.tweets.count({track: 'apple'})
Given that a lot of data (potentially millions) will be stored in the tweet database, I wonder if this can be a bit slow?
Option 2
Create a second track_count collection and update the count attribute every time a new tweet appears:
{track:'apple', count:0} {track:'orange', count:0} {track:'pear', count:0}
Then, when a new tweet appears:
db.track_count.update( { track:"apple" }, { $inc: { count : 1 } } );
Then I can update the counter for each track, but that means writing to the database twice, once for a tweet and again to increase the number of tracks. Bearing in mind, there may be a fair number (tens, possibly hundreds) of tweets arriving per second.
Does anyone have any suggestions on the best method for this?