I have the following two data structures.
First , a list of properties applied to object tags:
Object1 Object2 Object3 Property Value O1 O2 O3 P1 "abc" O1 O2 O3 P2 "xyz" O1 O3 O4 P1 "123" O2 O4 O5 P1 "098"
Second , the inheritance tree:
O1 O2 O4 O3 O5
Or regarded as a relation:
Object Parent O2 O1 O4 O2 O3 O1 O5 O3 O1 null
The semantics of this is that O2 inherits properties from O1; O4 is from O2 and O1; O3 - from O1; and O5 are from O3 and O1 in this priority order.
NOTE 1: I have an effective way to select all children or all parents of a given object. Currently this is done with left and right indexes, but a hierarchy can also work. Now this does not seem important.
NOTE 2: I have a tiger to make sure that the “Object” column always contains all the possible objects, even if they really should not be there (ie Do not have a parent or children). This allows you to use inner join , rather than significantly less effecient outer join s.
Purpose : if a pair is specified (Property, value), return all triples of objects that have this property with a value that is explicitly defined or inherited from the parent.
NOTE 1: An object triple (X,Y,Z) is considered a “parent” triple (A,B,C) when it is true, either either X = A or X is a parent of A , and the same is true for (Y,B) and (Z,C) .
NOTE 2: The property specified in the closer parent "overrides" the same property that is defined for the more distant parent.
NOTE 3: When (A, B, C) has two parents - (X1, Y1, Z1) and (X2, Y2, Z2), then (X1, Y1, Z1) is considered a “closer” parent when:
(a) X2 is the parent of X1, or
(b) X2 = X1 and Y2 is the parent of Y1, or
(c) X2 = X1 and Y2 = Y1, and Z2 is the parent of Z1
In other words, “proximity” in the pedigree for triples is determined first on the first components of the triple, then on the second components, then on the third components. This rule establishes an unambiguous partial order for triples in terms of a pedigree.
For example, given the pair (P1, "abc"), the result set of triples will be:
O1, O2, O3 -- Defined explicitly O1, O2, O5 -- Because O5 inherits from O3 O1, O4, O3 -- Because O4 inherits from O2 O1, O4, O5 -- Because O4 inherits from O2 and O5 inherits from O3 O2, O2, O3 -- Because O2 inherits from O1 O2, O2, O5 -- Because O2 inherits from O1 and O5 inherits from O3 O2, O4, O3 -- Because O2 inherits from O1 and O4 inherits from O2 O3, O2, O3 -- Because O3 inherits from O1 O3, O2, O5 -- Because O3 inherits from O1 and O5 inherits from O3 O3, O4, O3 -- Because O3 inherits from O1 and O4 inherits from O2 O3, O4, O5 -- Because O3 inherits from O1 and O4 inherits from O2 and O5 inherits from O3 O4, O2, O3 -- Because O4 inherits from O1 O4, O2, O5 -- Because O4 inherits from O1 and O5 inherits from O3 O4, O4, O3 -- Because O4 inherits from O1 and O4 inherits from O2 O5, O2, O3 -- Because O5 inherits from O1 O5, O2, O5 -- Because O5 inherits from O1 and O5 inherits from O3 O5, O4, O3 -- Because O5 inherits from O1 and O4 inherits from O2 O5, O4, O5 -- Because O5 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
Please note that the triple (O2, O4, O5) is not in this list. This is due to the fact that property P1 is explicitly defined for the triple (O2, O4, O5), and this prevents this triple from inheriting this property (O1, O2, O3). Also note that the triple (O4, O4, O5) is also missing. This is due to the fact that this triple inherits its value P1 = "098" from (O2, O4, O5), because it is a closer parent than (O1, O2, O3).
An easy way to do this is as follows. First, for each triple defined by a property, select all possible trinities for children:
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value from TriplesAndProperties tp -- Select corresponding objects of the triple inner join Objects as Objects1 on Objects1.Id = tp.O1 inner join Objects as Objects2 on Objects2.Id = tp.O2 inner join Objects as Objects3 on Objects3.Id = tp.O3 -- Then add all possible children of all those objects inner join Objects as Children1 on Objects1.Id [isparentof] Children1.Id inner join Objects as Children2 on Objects2.Id [isparentof] Children2.Id inner join Objects as Children3 on Objects3.Id [isparentof] Children3.Id
But this is not the whole story: if a triple inherits the same property from several parents, this query will give conflicting results. Therefore, the second step is to select only one of these conflicting results:
select * from ( select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value, row_number() over( partition by Children1.Id, Children2.Id, Children3.Id, tp.Property order by Objects1.[depthInTheTree] descending, Objects2.[depthInTheTree] descending, Objects3.[depthInTheTree] descending ) as InheritancePriority from ... (see above) ) where InheritancePriority = 1
The window function row_number() over( ... ) performs the following actions: for each unique combination of triple and property objects, it sorts all the values ​​at the distance of the ancestors from the triple to the parents to which the value is inherited, and then I select only the very first one received in The result is a list of values. A similar effect can be achieved using the GROUP BY and ORDER BY statements, but I just find that the window function is semantically cleared (the execution plans they give are identical). The fact is that I need to choose the closest of my predecessors, and for this I need to group, and then sort inside the group.
And finally, now I can just filter the result set using the property and value.
This circuit works. Very reliable and predictable. It turned out to be very powerful for the business task that it implements.
The only problem: she is awfuly slow .
It can be noted that joining seven tables can slow down the work, but in fact this is not a bottleneck.
According to the actual execution plan that I get from SQL Management Studio (as well as SQL Profiler), sorting is the bottleneck. The problem is that in order to satisfy my window function, the server must sort by Children1.Id, Children2.Id, Children3.Id, tp.Property, Parents1.[depthInTheTree] descending, Parents2.[depthInTheTree] descending, Parents3.[depthInTheTree] descending , and there can be no indexes that it can use, because the values ​​come from the cross join of several tables.
EDIT: At the suggestion of Michael Buen (thanks, Michael), I posted the whole riddle for sqlfiddle here.In terms of execution, you can see that the Sort operation takes into account 32% of the entire query, and this will grow with the number of full rows, since all other operations use indexes .
Usually in such cases, I would use an indexed view, but not in this case, because indexed views cannot contain self-joins, of which six.
The only way I can think of so far is to create six instances of the Objects table and then use them for joins, which will allow indexing the view.
Is it time that I will be reduced to such hacks? Despair comes.