How to efficiently query a directed acyclic graph

I use mysql for one in my web applications. The application table contains a manager table and an employee table. The employee table contains information about each employee. The supervisor table contains two columns as follows.

supervisor_id -> which is employee id of the supervisor subordinate_id -> which is the employee id of the subordinate. 

Each subordinate may have several managers, and one subordinate may be the leader of some other employee. Thus, the record table can be an ass as follows.

 supervisor_id | subordinate_id 1 | 2 1 | 3 2 | 4 4 | 5 3 | 6 3 | 4 

In the above example, there is a supervisor chain. Manager 1 has 2, 3, 4, 5, and 6 as his subordinates. Head 2 has 4, 5 as subordinates. And also he can have several leaders for the subordinate.

When I request all subordinates for supervisor 2 currently, I use a query similar to the following.

 public function getSubordinate($id) { $query = "SELECT * FROM supervisor WHERE subordinate_id = $id"; // get results and return } 

So, what I'm doing now is first send the identifier as 2 to get its immediate subordinates. Then for each resulting subordinate, I run the query again and again to get a complete chain of subordinates.

This is normal with a small data set. But this supervisor table will contain thousands of data, so I need to execute thousands of queries to find the chain of supervisors, and it takes time to get the results.

Since subordinates can have multiple supervisors, a nested set will not be the exact answer for this.

I went through this decision as well. http://www.codeproject.com/Articles/22824/A-Model-to-Represent-Directed-Acyclic-Graphs-DAG-o

But when I use this method, it will have millions of data with this table. and it is inefficient.

My problem is whether there is an effective way to do this. Are there any problems with my table structure that prevent me from effectively performing such queries.

+5
source share
2 answers

You say an acyclic graph, so maybe I left here - but does it sound at the same time that you need something for ordinary supervisors and a hierarchy of employees? so can this be done with a tree structure?

I'm not sure, but it looks like you only need a tree structure? and I think that the easiest way to get who is above one person is to save all the names in one table and use two fields to update the relationship between people. Fields will be left and right.

  _______ 1 | peter | 20 _______ ______ ______ 2 | paul | 17 18 | john | 19 ______ ______ _____ _______ 3 |judas | 4 5 | maria | 16 _____ _______ _____ ________ 6 |seth | 7 8 | abraham | 15 _____ _______ ______ 9 |bill | 14 _____ _____ _______ 10 |kenny | 11 12 | moses | 13 _____ _______ 

Which of the bosses of Moses? all with the highest right and lover on the left give you Bill, Abraham, Mary, Paul and Peter :-) It does not take time in the database to pull out. If this is interesting, I can update this answer with details on how to do this.

  table left right peter 1 20 paul 2 7 judas 3 4 maria 5 16 seth 6 7 ... etc select * from people where left < 12 and right > 13 

result:

  bill 9 14 abraham 8 15 maria 4 16 paul 2 17 peter 1 20 
0
source

All major database applications (including MySQL and MariaDB) now support recursive queries using common table expressions. This was introduced in MySQL version 8.0 and MariaDB version 10.2.2. PostgreSQL had support even earlier. Oracle has this, and SQL Server added this with the 2005 version. In fact, a quick search reveals that Sqlite also supports Common Table Expressions.

Therefore, the answer you can look for is to use common table expressions and recursive queries. This explains some of the reasons why this is considered a better solution compared to the " Representation Model of Directional Acyclic Graphs (DAG) in SQL Databases ":

Coding and querying graphs in a relational model
https://drtom.ch/posts/2012/02/11/Encoding_and_Querying_Graphs_in_the_Relational_Model/

(You can ignore the part where he says: “This will not work, particularly on MySQL or sqlite3, which simply do not support CTE.” As I said, this is no longer the case.)

As you noted in your question, "when I use this method, there will be millions of data in this table." This in itself may not be so bad if you exchanged space for efficiency everywhere, but as Dr. Tom Post explains in one example:

The operation of removing or inserting a red arc also requires effort in θ (n ^ 2).

This is an n-square force for these operations; You get query efficiency, but at the expense of space inefficiency and insertion / deletion inefficiencies. He goes on to point out that

almost all major real-world networks are rare. They have much fewer edges than would be possible, i.e. M "n ^ 2.

In fairness, the Cam Erdogan article you linked was written in 2008; CTEs were not public then. In addition, Erdogan made an informed choice regarding compromise, as he explained here:

The solution I have is based on recursion [too]. However, instead of deferring recursion until the request time, I do recursion during insertion, assuming that the graph is actually more requested than modified (which is true for all the cases that I have encountered so far).

If, after reading Dr. Tom’s article, you ultimately prefer Erdogan’s compromises, you can limit other inefficiencies by looking at the Laravel implementation here:

GitHub - telkins / laravel-dag-manager: SQL-based directed acyclic graph (DAG) solution for Laravel. https://github.com/telkins/laravel-dag-manager

In particular, look at Max Hops and implement something similar in your own solution.

This is in the Laravel configuration file:

 /* |-------------------------------------------------------------------------- | Max Hops |-------------------------------------------------------------------------- | | This value represents the maximum number of hops that are allowed where | hops "[i]ndicates how many vertex hops are necessary for the path; it is | zero for direct edges". | | The more hops that are allowed (and used), then the more DAG edges will | be created. This will have an increasing impact on performance, space, | and memory. Whether or not it negligible, noticeable, or impactful | depends on a variety of factors. */ 'max_hops' => 5, 

Disclaimer: I am only now investigating this for myself. I have no experience using any of these solutions yet.

0
source

All Articles