MongoDB too many records?

Question

MongoDB too many records?

I have a PHP application that interacts with MongoDB. Until recently, the application worked fine, but after a few days I found that the application starts to respond REALLY slowly. One of the collections rose to 500K + records. Thus, MongCursor saves time for any request in this collection.

I do not think that 500K records are too many. Other pages using mongodb are also starting to slow down, but not as much as the one using a collection with 500k entries. Static pages that do not interact with MongoDB still respond quickly.

I'm not sure what might be the problem here. I indexed collections, so this does not seem to be a problem. Another point to note is that the RAM spec on the server is 512 MB, and when PHP executes Mongo, the top command shows 15000k of free memory.

Any help would be greatly appreciated.

+4

php mongodb

Ayush Chaudhary Jul 28 '12 at 12:40

source share

2 answers

Yes, 500K + should be fine. According to my information, there is no real "limit" to the number of documents in a collection. This is probably the number of unique combinations of the _id field that MongoDB can create. But it will be much more than 500K. In your case, which I suspect, perhaps your request is not very selective. Therefore, when there were fewer documents in the collection, you did not notice a problem. But with the increase, it seems that he is becoming sluggish. For example, how many documents are returned by MongoCursor?

0

Aafreen sheikh Jul 28 '12 at 12:58

source share

Stennie · Accepted Answer · 2012-07-28T15:09:26+0000

To summarize the next steps from the chat room, the problem is with the find () query, which checks all ~ 500k documents to find 15:

db.tweet_data.find({ $or: [ { in_reply_to_screen_name: /^kunalnayyar$/i, handle: /^kaleycuoco$/i, id: { $gt: 0 } }, { in_reply_to_screen_name: /^kaleycuoco$/i, handle: /^kunalnayyar$/i, id: { $gt: 0 } } ], in_reply_to_status_id_str: { $ne: null } } ).explain() { "cursor" : "BtreeCursor id_1", "nscanned" : 523248, "nscannedObjects" : 523248, "n" : 15, "millis" : 23682, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "id" : [ [ 0, 1.7976931348623157e+308 ] ] } }

This query uses case-insensitive regular expressions that will not efficiently use the index (although there really wasn’t a specific one, in this case).

Proposed Approach:

create lowercase handle_lc and inreply_lc search fields
add a composite index :
db.tweet.ensureIndex({handle_lc:1, inreply_lc:1})
the order of the composite index allows you to efficiently find all tweets, either handle , or ( handle,in_reply_to )
search by exact match instead of regular expression:

db.tweet_data.find({ $or: [ { in_reply_to_screen_name:'kunalnayyar', handle:'kaleycuoco', id: { $gt: 0 } }, { in_reply_to_screen_name:'kaleycuoco', handle:'kunalnayyar', id: { $gt: 0 } } ], })

MongoDB too many records?

More articles: