Let me give you a couple of tips based on my global knowledge and experience:
Use shorter field names
MongoDB saves the same key for each document. This repetition causes an increase in disk space. This can lead to some performance issue in a very large database like yours.
Pros:
- Smaller documents, therefore less disk space
- More documennt for installation in RAM (more caching)
- The size of the do indexes will be smaller in some scenario
Minuses:
Index Size Optimization
The smaller the size of the index, the more it will fit in RAM and the fewer gaps in the index. Consider, for example, the SHA1 hash for git. A git commit is represented many times by the first 5-6 characters. Then just save 5-6 characters instead of the entire hash.
Understand duty cycle
For updates occurring in the document, which leads to the transfer of an expensive document. This document moves, which will delete the old document and update it to a new empty space and update indexes, which are expensive.
We need to make sure that the document does not move if there is some kind of update. There is a filling factor for each collection, which during document entry indicates how much extra space should be allocated separately from the actual size of the document.
You can see the fill factor of the collection using:
db.collection.stats().paddingFactor
Manual indentation
In your case, you will probably start with a small document that will grow. Updating your document after this will result in several document movements. Therefore, it is better to add an addition to the document. Unfortunately, there is no easy way to add an add-on. We can do this by adding some random bytes to some key when performing the insert, and then delete this key in the next update request.
Finally, if you are sure that some keys to documents will appear in the future, provide these keys with some default values โโso that future updates do not increase the size of the document causing the document to move.
You can get information about the request causing the document to move:
db.system.profile.find({ moved: { $exists : true } })
A large number of collections VS a large number of documents in several collections
A schema is what depends on the requirements of the application. If there is a huge collection in which we request only the last N days of data, then we can choose to select a separate collection, and old data can be safely archived. This will ensure that caching is performed correctly in RAM.
Each created collection carries costs that are greater than the costs of creating the collection. Each of the collections has a minimum size of several kilobytes + one index (8 KB). Each collection has an associated namespace, by default we have 24K namespaces. For example, having a collection for the user is a poor choice because it does not scale. After some time, Mongo will not allow us to create new collections of indexes.
As a rule, the presence of a large number of collections does not have a significant decrease in performance. For example, we can choose one collection per month if we know that we always request based on months.
Data Denormalization
It is always recommended that you store all related data for a query or query sequence in the same place on disk. You need something to duplicate information in different documents. For example, in a blog post, you want to keep comments for comments in a published document.
Pros:
- The index size will be very smaller since the number of index entries will be less
- the request will be very fast, which includes the collection of all the necessary details.
- the size of the document will be comparable to the size of the page, which means that when we bring this data into RAM, most of the time we donโt give other data along the page.
- moving the document ensures that we free the page, and not a small tiny fragment on the page that cannot be used in additional inserts
Cropped collections
The Capped collection behaves like circular buffers. They are special collections of fixed size. These collections can receive high-speed recordings and sequential readings. Being a fixed size, as soon as the allocated space is full, new documents are recorded by deleting older ones. However, document updates are only allowed if the updated document matches the size of the original document (play with the add-on for more flexibility).