How to handle database cleanup in Mongodb

I use mongodb to store 30-day data that comes to me as a stream. I am looking for a cleaning mechanism with which I can throw away the oldest data to create a place for new data. I used mysql in which I handled this situation using partitions. I had 30 sections based on date. I delete the oldest pending partition and create a new partition to store new data.

When I match the same thing in mongodb, it seems to me that I'm using date-based "shards". But the problem is that it affects data distribution badly. If all the new data is in one fragment, then this fragment will be so hot that there are many people who access it, and fragments containing older data will be less loaded by users.

I can have collection-based cleanup. I can have 30 collections, and I can throw away the oldest collection to post new data. But a couple of problems: 1) If I make the collection smaller, then I can not benefit from the fragments, since they are made for each collection. 2) My requests should change to a request from all 30 collections and accept the union.

Please offer me a good cleaning mechanism (if any) to handle this situation.

+8
mongodb database-design
source share
4 answers

There are really only three ways to clean up MongoDB. It looks like you have already identified a few compromises.

  • Single collection, removal of old records
  • Collection of the day, discard old collections
  • Database per day, discarding old databases

Option number 1: single collection

pros

  • Easy to implement
  • Easy to run Map / Reduces

against

  • Removing is just as expensive as inserting, causing a lot of I / O and the need to "defragment" or "compactly" the database.
  • At some point, you will finish the double-entry processing, as you will have to insert day-to-day data and delete data for the day.

Option number 2: daily fee

pros

  • Deleting data with collection.drop() is very fast.
  • Preservation of the card / Decrease in friendliness, since the output of each day can be combined or re-summarized to the totals.

against

  • You may have some fragmentation issues.
  • You will need to rewrite the requests. However, in my experience, if you have enough data that you clear, you rarely access that data directly. Instead, you are trying to run Map / Reduces on this data. Therefore, this may not change many queries.

Option 3: Database per day

pros

  • Deletion occurs as quickly as possible, files are simply truncated.
  • Zero fragmentation issues and easy backup / restore / archive of old data.

against

  • Will make the request more complicated (expect to write some shell code).
  • It's not so easy to write Map / Reduce, but take a look at an aggregation structure that can better suit your needs anyway.

Now there is option # 4, but this is not a general solution. I know some people who "cleaned up" just using Capped Collections . There are certain cases where this works, but it has a ton of warnings, so you really need to know what you are doing.

+9
source share

we can set TTL for the collection from mongodb version 2.2 or higher. this will help you obsolete old data from the collection.

Follow this link: http://docs.mongodb.org/manual/tutorial/expire-data/

+5
source share

I had a similar situation, and this page helped me, especially the Useful Scripts section below. http://www.mongodb.org/display/DOCS/Excessive+Disk+Space

0
source share

It is better to keep one server in the archive. Do cleaning for 15 days. Delete the old one from the archive. Make an archive with a large data section

0
source share

All Articles