As already written, rules such as the second normal form for SQL do not exist.
However, there are some best practices and common errors related to optimization for MongoDB that I will list here.
Excessive use of attachments
BSON Limit
Contrary to popular belief, there is nothing wrong with links. Suppose you have a library of books and want to track your rental. You can start with a model like
{ // We use ISBN for its uniqueness _id: "9783453031456" title: "Schismatrix", author: "Bruce Sterling", rentals: [ { name:"Markus Mahlberg, start:"2015-05-05T03:22:00Z", due:"2015-05-12T12:00:00Z" } ] }
While there are several problems with this model, the most important thing is not obvious - there will be a limited amount of rent due to the fact that BSON documents have a size limit of 16 MB.
The problem of document migration
Another problem of storing a lease in an array will be that it will lead to relatively frequent document transfers, which is a rather expensive operation. BSON documents are never partitioned or created with some extra space predefined when they grow. This extra space is called padding. When the padding is exceeded, the document is moved to another location in the data files and a new pad space is allocated. Therefore, the frequent addition of data causes frequent document migrations. Therefore, it is best to avoid frequent updates that increase the size of the document and the use of links.
So, for example, we would change our single model and create a second one. First, a model for a book
{ _id: "9783453031456", title:"Schismatrix", author: "Bruce Sterling" }
The second rental model will look like this:
{ _id: new ObjectId(), book: "9783453031456", rentee: "Markus Mahlberg", start: ISODate("2015-05-05T03:22:00Z"), due: ISODate("2015-05-05T12:00:00Z"), returned: ISODate("2015-05-05T11:59:59.999Z") }
The same approach, of course, can be used for the author or tenant.
Normal Normalization Problem
Take a look back. The developer would identify the entities involved in the business case, determine their properties and relationships, write down the corresponding entity classes, hit his head on the wall for several hours to get the required triple internal external external and external JOIN work for the use case, and everyone lived for a long time and happily. So why bother using NoSQL in general and MongoDB? Because no one lived happily ever after. This approach scales horribly, and almost exclusively the only way to scale is vertical.
But the main difference between NoSQL is that you model your data according to the questions you need to answer.
So, we will consider a typical relation n: m and we will take the relation of authors to books as our example. In SQL, you will have 3 tables: two for your entities (books and authors) and one for the relationship (who is the author of which book?). Of course, you could take these tables and create your equivalent collections. But, since there are no JOINs in MongoDB, you will need three queries (one for the first object, one for its relations and one for related objects) to find the related documents of the object. This would not make sense, since the three-table approach for n: m relationships was specifically invented to overcome strict SQL database schemas. Since MongoDB has a flexible scheme, the first question is where to store the relationship, while preserving the problems arising from overuse of attachments. Since the author can write quite a few books in the coming years, but the authorship of the book rarely, if at all, changes, the answer is simple: we save the authors as a link to the authors in these books
{ _id: "9783453526723", title: "The Difference Engine", authors: ["idOfBruceSterling","idOfWilliamGibson"] }
And now we can find the authors of this book by completing two queries:
var book = db.books.findOne({title:"The Difference Engine"}) var authors = db.authors.find({_id: {$in: book.authors})
I hope this helps you decide when to actually “smash” your collections and get around the most common traps.
Conclusion
As for your questions, here are my answers
- As written earlier: No , but given the technical limitations, you should give you an idea when this might make sense.
- This is not bad - as long as it is suitable for your use (s) . If you have a specific category and its
_id , it is easy to find related products. When you download a product, you can easily get the categories to which it belongs, even efficiently, since _id indexed by default. - I have yet to find a use case that cannot be done with MongoDB, although some things might get a little more complicated with MongoDB. What you should do imho is to take the sum of your functional and non-functional requirements and check the advantages that exceed the disadvantages. My rule of thumb is: if your list of requirements contains one of “scalability” or “high availability / automatic switch to another resource”, MongoDB is worth more than just looking.