MongoDB (noSQL) when to split collections

Question

MongoDB (noSQL) when to split collections

So, I am writing an application in NodeJS and ExpressJS. This is my first time using a noSQL database like MongoDB and I'm trying to figure out how to fix my data model.

At the beginning of our project, we recorded everything in terms of a relationship database, but since we recently switched from Laravel to ExpressJS for our project, I am a little fixated on what to do with all my table layouts.

So far, I found out that it is better to denormalize your circuit, but it should end somewhere, right? As a result, you can save all the data in one collection. Well, not kindly, but you understand.

1. So, is there a rule or standard that defines where to cut to create multiple collections? I have a database of relationships with users (who are a customer or store user), stores, products, purchases, categories, subcategories.

2. Is it wrong to define relationships in a noSQL database? Like every product there is a category, but I want to categorize by id (the parent does the work in MongoDB), but is it bad? Or is it when you choose performance and structure?

3. Is noSQL / MongoDB used for such large databases that have many relationships (if they were made in MySQL)?

Thanks in advance

+2

node.js mongodb

Jesse struyvelt May 06 '15 at 11:30

source share

2 answers

The very "first" thing to consider when choosing a NoSQL solution for storage by a "relational" solution is that things "don't work the same" and therefore design differently.

More specifically, solutions like MongoDB are not “intended” to “emulate” the “relational join” structure that is present in many SQL and therefore “relational” backends, and that they are also intended to view data “merged” " differently".

This comes to your “questions” as follows:

Actually there is no established “rule” and understand that the “rules” of denormalization are not applied here for the main reason why NoSQL solutions exist. And that should offer something “different” that might work well for your situation.
This is bad? It's good? Both are subjective. Bearing in mind point “1” here, the main consideration is that databases “without relational” or “NoSQL” are designed to do things “differently” than the relational system. Thus, there is usually a “penalty” for “emulating associations” in a relational manner. In particular, for MongoDB, this means "additional queries." But this does not mean that you cannot or should not do this. Most likely, it all depends on how your usage pattern works for your application.
Reclosing the base points made above, NoSQL is generally designed to solve problems that do not correspond to the traditional SQL design pattern and / or “relational” ones, and therefore replace them with something else. The "ultimate goal" here is for you to "rethink your data access patterns" and develop an application to use the repository model, which is more suitable for accessing it when using your application.

In short, there are no strict rules, and this is also part of the need to move away from the rules of the nth normal form. NoSQL solutions, such as MongoDB, allow you to store a "nested structure" that typical SQL / Relational solutions do not provide in an efficient way.

The other side of this question suggests that operations such as “joins” do not “scale” well in “big data” forms, so there is another way to “join” by proposing concepts such as “built-in data structures” such as MongoDB.

It will be useful for you to familiarize yourself with some guides on topics on which many NoSQL solutions are suitable for storing and accessing data. This is ultimately what you need to decide in order to determine what is best for you and your application.

In the end, it should be about implementing when the SQL / Relational model does not meet your needs, and then choosing something else.

+4

user3561036 May 06 '15 at 11:51

source share

Markus W Mahlberg · Accepted Answer · 2015-05-06T14:27:41+0000

As already written, rules such as the second normal form for SQL do not exist.

However, there are some best practices and common errors related to optimization for MongoDB that I will list here.

Excessive use of attachments

BSON Limit

Contrary to popular belief, there is nothing wrong with links. Suppose you have a library of books and want to track your rental. You can start with a model like

{ // We use ISBN for its uniqueness _id: "9783453031456" title: "Schismatrix", author: "Bruce Sterling", rentals: [ { name:"Markus Mahlberg, start:"2015-05-05T03:22:00Z", due:"2015-05-12T12:00:00Z" } ] }

While there are several problems with this model, the most important thing is not obvious - there will be a limited amount of rent due to the fact that BSON documents have a size limit of 16 MB.

The problem of document migration

Another problem of storing a lease in an array will be that it will lead to relatively frequent document transfers, which is a rather expensive operation. BSON documents are never partitioned or created with some extra space predefined when they grow. This extra space is called padding. When the padding is exceeded, the document is moved to another location in the data files and a new pad space is allocated. Therefore, the frequent addition of data causes frequent document migrations. Therefore, it is best to avoid frequent updates that increase the size of the document and the use of links.

So, for example, we would change our single model and create a second one. First, a model for a book

 { _id: "9783453031456", title:"Schismatrix", author: "Bruce Sterling" }

The second rental model will look like this:

 { _id: new ObjectId(), book: "9783453031456", rentee: "Markus Mahlberg", start: ISODate("2015-05-05T03:22:00Z"), due: ISODate("2015-05-05T12:00:00Z"), returned: ISODate("2015-05-05T11:59:59.999Z") }

The same approach, of course, can be used for the author or tenant.

Normal Normalization Problem

Take a look back. The developer would identify the entities involved in the business case, determine their properties and relationships, write down the corresponding entity classes, hit his head on the wall for several hours to get the required triple internal external external and external JOIN work for the use case, and everyone lived for a long time and happily. So why bother using NoSQL in general and MongoDB? Because no one lived happily ever after. This approach scales horribly, and almost exclusively the only way to scale is vertical.

But the main difference between NoSQL is that you model your data according to the questions you need to answer.

So, we will consider a typical relation n: m and we will take the relation of authors to books as our example. In SQL, you will have 3 tables: two for your entities (books and authors) and one for the relationship (who is the author of which book?). Of course, you could take these tables and create your equivalent collections. But, since there are no JOINs in MongoDB, you will need three queries (one for the first object, one for its relations and one for related objects) to find the related documents of the object. This would not make sense, since the three-table approach for n: m relationships was specifically invented to overcome strict SQL database schemas. Since MongoDB has a flexible scheme, the first question is where to store the relationship, while preserving the problems arising from overuse of attachments. Since the author can write quite a few books in the coming years, but the authorship of the book rarely, if at all, changes, the answer is simple: we save the authors as a link to the authors in these books

 { _id: "9783453526723", title: "The Difference Engine", authors: ["idOfBruceSterling","idOfWilliamGibson"] }

And now we can find the authors of this book by completing two queries:

 var book = db.books.findOne({title:"The Difference Engine"}) var authors = db.authors.find({_id: {$in: book.authors})

I hope this helps you decide when to actually “smash” your collections and get around the most common traps.

Conclusion

As for your questions, here are my answers

As written earlier: No , but given the technical limitations, you should give you an idea when this might make sense.
This is not bad - as long as it is suitable for your use (s) . If you have a specific category and its _id , it is easy to find related products. When you download a product, you can easily get the categories to which it belongs, even efficiently, since _id indexed by default.
I have yet to find a use case that cannot be done with MongoDB, although some things might get a little more complicated with MongoDB. What you should do imho is to take the sum of your functional and non-functional requirements and check the advantages that exceed the disadvantages. My rule of thumb is: if your list of requirements contains one of “scalability” or “high availability / automatic switch to another resource”, MongoDB is worth more than just looking.

MongoDB (noSQL) when to split collections

Excessive use of attachments

BSON Limit

The problem of document migration

Normal Normalization Problem

Conclusion

More articles: