To summarize the next steps from the chat room, the problem is with the find () query, which checks all ~ 500k documents to find 15:
db.tweet_data.find({ $or: [ { in_reply_to_screen_name: /^kunalnayyar$/i, handle: /^kaleycuoco$/i, id: { $gt: 0 } }, { in_reply_to_screen_name: /^kaleycuoco$/i, handle: /^kunalnayyar$/i, id: { $gt: 0 } } ], in_reply_to_status_id_str: { $ne: null } } ).explain() { "cursor" : "BtreeCursor id_1", "nscanned" : 523248, "nscannedObjects" : 523248, "n" : 15, "millis" : 23682, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "id" : [ [ 0, 1.7976931348623157e+308 ] ] } }
This query uses case-insensitive regular expressions that will not efficiently use the index (although there really wasn’t a specific one, in this case).
Proposed Approach:
create lowercase handle_lc and inreply_lc search fields
add a composite index :
db.tweet.ensureIndex({handle_lc:1, inreply_lc:1})
the order of the composite index allows you to efficiently find all tweets, either handle , or ( handle,in_reply_to )
search by exact match instead of regular expression:
db.tweet_data.find({ $or: [ { in_reply_to_screen_name:'kunalnayyar', handle:'kaleycuoco', id: { $gt: 0 } }, { in_reply_to_screen_name:'kaleycuoco', handle:'kunalnayyar', id: { $gt: 0 } } ], })
source share