Storing and querying Heirarchical data with multiple parent nodes

I searched quite a lot, but could not find many resources on this topic. My goal is to store planning data, as you might find in a Gantt chart. Thus, one example of data storage can be:

Task Id | Name | Duration 1 Task A 1 2 Task B 3 3 Task C 2 Task Id | Predecessors 1 Null 2 Null 3 1 3 2 

Why will you need task C, waiting for the completion of task A and task B.

So my question is: what is the best way to store this type of data and query it efficiently? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add a few parents, it becomes difficult to find the information. By the way, I am working with SQL Server and .NET for this task.

+4
source share
3 answers

Your problem is related to the concept of the cardinality of relations. All relations have some power, which expresses the potential number of instances on each side of the relations that are its members, or can participate in one instance of the relationship. For example, for humans (for most living things, I think, with rare exceptions), the Parent-Child relationship has a power of 2 to zero or many , which means that two parents are required on the parent side, and there can be zero or many children (maybe it should be 2 to 1 or many )

In database design, as a rule, everything that has one (one), (or zero or one), on the one hand, can easily be represented by only two tables, one for each object (sometimes only one table is required; see note **) and the foreign key column in the table representing the "many" side, which points to another table containing the object on the "one" side.

In your case, you have a many to many relationship. (A task can have several predecessors, and each of the predecessors can certainly be a predecessor for several tasks). In this case, a third table is needed, where each row effectively represents the relationship between the two tasks, representing it as the predecessor of Others. Typically, this table is intended to contain only all the primary key columns of the two parent tables, and the own primary key is the collection of all columns in both parent primary keys. In your case, these are just two columns: taskId and PredecessorTaskId, and this pair of identifiers should be unique in the table, so together they form a composite PK.

When prompted to avoid double counts of data columns in parent tables when there are multiple joins, simply enter the query into the parent table ... for example, to find the duration of the longest parent, assuming your association table is called TaskPredecessor

  Select TaskId, Max(P.Duration) From Task T Join Task P On P.TaskId In (Select PredecessorId From TaskPredecessor Where TaskId = T.TaskId) 

** NOTE: In cases where both objects in the relationship have the same entity type, they can be in the same table. The canonical (luv is a word) example is a table of employees with many employee relations to the Supervisor ... Since the supervisor is also an employee, both employees and managers can be in the same table [Employee], and reality can be modeled using a foreign key (called SupervisorId) that points to another row in the same table and contains the employee record identifier for that employee.

+2
source

Use the adjacency list model:

 chain task_id predecessor 3 1 3 2 

and this query to find all the predecessors of this task:

 WITH q AS ( SELECT predecessor FROM chain WHERE task_id = 3 UNION ALL SELECT c.predecessor FROM q JOIN chain c ON c.task_id = q.predecessor ) SELECT * FROM q 

To get the duration of the longest parent for each task:

 WITH q AS ( SELECT task_id, duration FROM tasks UNION ALL SELECT t.task_id, t.duration FROM q JOIN chain  ON c.task_id = q.task_id JOIN tasks t ON t.task_id = c.predecessor ) SELECT task_id, MAX(duration) FROM q 
+2
source

Check the Hierarchical Weighted Summary template in the SQL Design Patterns book or the Bill of Materials section in the Trees and Hierarchies in SQL section.

In a word, graphs have double aggregation. You perform one type of aggregation by nodes in each path, and the other by alternative paths. For example, to find the minimum distance between two nodes is minimal by summing. A hierarchical weighted general request (aka Bill Materials) is the multiplication of values ​​along each path and the summation over each alternative path:

     
        with TCAssembly as (
           select Part, SubPart, Quantity AS factoredQuantity
           from AssemblyEdges
           where Part = 'Bicycle'
           union all
           select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
           from TCAssembly te, AssemblyEdges e
           where te.SubPart = e.Part
        ) select SubPart, sum (Quantity) from TCAssembly
        group by SubPart        
+1
source

All Articles