Mysql hierarchy repository with large trees

I do not know how to store my hierarchical data in my innoDB table.

I read a lot about the disadvantages of the method of storing parent_id on each line. But now the problem is that I have a very large database (~ 50 million rows). The hierarchy is basically not very deep (3-6 levels).

Many websites recommend using the Nested Typing Model as the best alternative to the method of storing the parent identifier. But there are always changes ( UPDATE , INSERT , etc.) by the users of the website, and because of the size of my table, it takes too much time (because the changes in the "Nested Sets Model" are very slow).

So my question is: how do you store efficiently large hierarchical data with many UPDATE / INSERT ? (Also locking the entire table is not an option [-> innoDB-table])

+4
source share
2 answers

The design of nested sets is definitely difficult if you need to regularly update the tree. As a result, you will have to renumber large parts of the tree.

One suggestion to mitigate this is to use floating point numbers instead of integers. If you insert a new node into the tree, it is relatively easy to find some FLOAT numbers between the nested sets of the parent of the new node. Ultimately, you can go to the precision limit of a floating point number, but since your tree is not very deep, that won't happen for a long time.

Another technique I wrote about is what I call a closing table . This method of storing hierarchies makes it easy to insert / update / delete nodes in a large tree without having to update a large number of your tree. And you can query the whole tree or any subtree in one non-recursive SQL query.

See below for more details on the closing table:


Your comment:

The Adjacency list is simple, has minimal redundancy, and maintains FK relationships that don't have nested sets. The Adjacency List supports querying the entire tree of arbitrary depth if you use recursive queries . But MySQL does not support recursive queries.

If you need to request only direct parent-child relationships (i.e. one level of depth) or otherwise request only trees with a fixed depth, then the Adjacency List is fine.

+2
source

For hierarchical data, I like to split the hierarchy. For example, if we are dealing with a hierarchy of employees, I usually do something like this -

 create table employee ( id serial primary key, name varchar(50)); create table roster ( id serial primary key, employee_id int references employee (id), supervisor_id int references employee (id)); 

This can be extended to provide historical hierarchies by adding the row_date or start_date and stop_date to the roster table.

Make sure you have unique constraints and triggers that apply where applicable to enforce business rules.

+1
source

All Articles