It is not good to use huge "documents" in MongoDB?

Since we can structure MongoDB in any way we want, we can do it this way

{ products: [ { date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }}, { date: "2010-09-09", data: { pageviews: 36, timeOnPage: 202 }} ], brands: [ { date: "2010-09-08", data: { pageviews: 123, timeOnPage: 210 }}, { date: "2010-09-09", data: { pageviews: 61, timeOnPage: 876 }} ] } 

as we add data to it day after day, the products and brands document will become more and more. In 3 years there will be a thousand elements in products and brands . Isn't that good for MongoDB? Should we break it into 4 documents:

 { type: 'products', date: "2010-09-08", data: { pageviews: 23, timeOnPage: 178 }} { type: 'products', date: "2010-09-09", data: { pageviews: 36, timeOnPage: 202 }} { type: 'brands', date: "2010-09-08", data: { pageviews: 123, timeOnPage: 210 }} { type: 'brands', date: "2010-09-08", data: { pageviews: 61, timeOnPage: 876 }} 

So, in 3 years there will be only 2,000 “documents”?

+4
source share
5 answers

Assuming you are using Mongoid (you checked it), you will not want to use your first schema idea. It would be very ineffective if the Mongoid pulled out these huge documents every time he wanted to see at least one small value.

Most likely, it will be much better for you:

 class Log include Mongoid::Document field :type field :date field :pageviews, :type => Integer field :time_on_page, :type => Integer end 

This will give you documents that look like this:

 {_id: ..., date: '2010-09-08', type: 'products', pageviews: 23, time_on_page: 178} 

Don't worry about the number of documents - Mongo can handle billions of them. And you can index by type and date to easily find the numbers you need.

In addition, it is thus much easier to update records through the driver without even pulling the record out of the database. For example, on every pageview you can do something like:

 Log.collection.update({'type' => 'products', 'date' => '2010-09-08'}, {'$inc' => {'pageview' => 1}}) 
+2
source

I am not a MongoDB expert, but 1000 is not "huge". I would also seriously doubt any difference between 1 top-level document containing 4,000 full sub-elements and 4 top-level documents, each of which contains 1,000 sub-elements - one of these six is ​​one against half a dozen other questions.

Now, if you talked with one document with 1,000,000 elements compared to 1,000 documents, each of which contains 1,000 elements, which may differ in order of magnitude +, there may be advantages to one or the other, both during storage and request time .

+1
source

You talked about how you are going to update the data, but how do you plan to request it? This probably depends on how you structure your documents.

The problem with using inline elements in arrays is that every time you add to it, it may not correspond to the current space allocated for the document. This will lead to the redistribution and transfer of the (new) document (for this movement, you will need to rewrite any of the indexes for the document).

I would suggest the second form you proposed, but it depends on the above issues.

Note: 4 MB is an arbitrary limit and will be raised soon; you can recompile the server for any limitation that you really want.

0
source

It seems your design is very similar to the relational table schema.

alt text

Thus, each added document will be a separate entry in the collection, which has its own identifier. Although the size of a mongo document is limited to 4 MB, it is basically enough to accommodate text documents. And you do not need to worry about the number of documents growing in mongo, this is the essence of document-based databases.

The only thing you need to worry about is the size of the db collection. Its limitation is 2 GB for 32-bit systems. Because MongoDB uses memory mapped files, as they are tied to available memory addressing. This is not a problem with 64-bit systems.

Hope this helps

0
source

Again, this depends on your use of the queries. If you really need a single product, for example, a product per day:

{type: 'products', date: "2010-09-08", data: {pageviews: 23, timeOnPage: 178}}

then you can include several days for one date.

{type: 'products', {date: "2010-09-08", data: {pageviews: 23, timeOnPage: 178}}}

We use something like this:

{type: 'products', "2010": {"09": {"08": data: {pageviews: 23, timeOnPage: 178}}}}}

So, we can increase every day: {"$ inc": {"2010.09.08.data.pageviews": 1}}

It may seem complicated, but the advantage is that you can store all the "type" data in 1 record. Thus, you can get one record and get all the information.

0
source

All Articles