MongoDB Design Guidelines

Question

MongoDB Design Guidelines

Say you want to simulate a specific situation. A company may have one or more branches. And these branches have employees who can work in different companies (or even in two different branches of the same company). This, of course, is just an example.

Suppose also that most inquiries / inquiries will be performed in collections of employees and companies.

The first (naive) way to do this is to implement everything (the Company has an array of branches and branches with many employees):

{ name: "Company name", // other company data branches : [ { name: "Branch name", // other branch data Employees: [ { // employee1 data }, { // employee data }, ] } ] }

But this would be very ineffective if someone was interested in obtaining information about employees (you would need to get a company, and then iterate through each branch to find the employee that is required).

On the other hand, you can use links and simulate RDBMS (there will be a collection of the company, branch and employee), but this will mean more requests.

The third option (to which I am closest) will have Employee as a separate collection, and then have an array of links to it in the branches. In addition, to provide faster requests, such as “employees with specific names that work for a specific company and a specific industry,” ObjectId can be stored in the Employee collection:

 { company_id: "some id", first_name: "First name", last_name: "Last name", // }

So, in this case, to search for all employees with certain names that work for a certain company and a certain branch, it would be necessary to fulfill two queries. The first query will return companies that satisfy the "company condition" (company name and branch name), and then the second query in the Employee collection will return all employees who have given a name and who work in companies whose identifiers are returned in the first query.

Would you do it differently? Is there any other “recommended” way to do this? Could you add some improvements?

More importantly, what to do in a situation where these two queries return multiple results that have a small intersection? How to increase productivity in this case?

+7

mongodb nosql

kevin Dec 26 '12 at 18:00

source share

1 answer

Philipp · Accepted Answer · 2012-12-27T13:09:13+0000

I think that you are mostly heading in the right direction.

Although there are times when denormalization in MongoDB is not evil, as in a relational database, but in fact this is correct, you have a case where you must use multiple collections. This is because MongoDB documents have an upper limit of 16 MB. When you have a very large company with a large number of branches, in which there are many employees, and the employee supporting document becomes more confusing, you can easily break this limit.

Having a link from a company employee is a good idea. But you should consider using not the company _id field, but rather the company name and branch name, if you can guarantee that each combination will be unique in the company’s collection (for example, with a unique composite index on these two fields). The reason is that when you look at an employee, you also usually need the name of the companies and branches. When you have only _id, you will need to make additional requests to get this information.

You said that you do not have a 1: n relationship between branches and employees, but rather an n: m relationship. In this case, I would recommend that you add an array of “appointments” to each employee that contains objects with two fields, company_name and company_branch (perhaps you would like to add a third field “position” that says what he or she does there).

Your documents for employees will look as follows:

 { first_name: "First name", last_name: "Last name", // assignments: [ { company:"Aperture Science", branch:"R&D", position:"test subject" }, { company:"Black Mesa", branch:"security", position:"leader of blue shift" } ] }

Please note: here you can use the power of schema databases: you can easily have companies that not only have branches, but even more hierarchy levels (for example, departments and groups) and others that do not.

But what when I want to rename a company or branch?

In this case, you will have to update each employee document that refers to the renamed company / branch. Yes, that would not be the most effective scheme for this case. But remember that MongoDB schemas should always be optimized for the most common use cases. What do you think will happen more often: a) the company or branch is renamed or b) someone wants to find an employee?

MongoDB Design Guidelines

More articles: