O (1) to determine if a node is a descendant of another node in a multi-track tree?

Imagine the following tree:

A / \ BC / \ \ DEF 

I am looking for a way to query if, for example, F is a descendant of A (note: F does not have to be a direct descendant of A), which would be true in this particular case. Only a limited number of potential parent nodes should be tested with a large pool of potential children.

When checking whether a node is a descendant of a node in the potential parent pool, it must be checked for ALL potential parent nodes.

Here's what they came up with:

  • Convert multi-channel tree to three, i.e. Assign the following prefixes to each node in the above tree:

      A = 1 B = 11 C = 12 D = 111 E = 112 F = 121 
  • Then reserve a bitmap for each possible prefix size and add parent nodes for verification, i.e. if C is added to the pool of potential parent nodes, do:

      1 2 3 <- Prefix length *[1] [1] ... [2] *[2] ... [3] [3] ... [4] [4] ... ... ... 
  • When checking if a node is a descendant of a potential parent node, take its trie prefix, look for the first character in the first "prefix array" (see above) and, if present, look for the second prefix character in the second "prefix". array ", etc., i.e. checking F results in:

      F = 1 2 1 *[1] [1] ... [2] *[2] ... [3] [3] ... [4] [4] ... ... ... 

    so yes, F, is a descendant of C.

This test looks like the worst case O (n), where n = maximum prefix length = maximum tree depth, so its worst case is exactly equal to the obvious way to climb up the tree and compare nodes. However, this works much better if the test node is at the bottom of the tree, and the potential parent node is somewhere at the top. Combining both algorithms will reduce both worst-case scenarios. However, lack of memory is a concern.

Is there any other way to do this? Any pointers are much appreciated!

+9
source share
7 answers

Are your input trees always static? If so, then you can use the Lowest Common Ancestor algorithm to answer the question about the descendant in O (1) time with the construction of the time / space O (n). An LCA query is specified by two nodes and is asked which is the lowest node in the tree whose subtree contains both nodes. You can then answer the IsDescendent request with one LCA request, if LCA (A, B) == A or LCA (A, B) == B, then one is a descendant of the other.

This Topcoder tuorial algorithm provides a detailed discussion of the problem and several solutions at different levels of code complexity / efficiency.

+6
source

I don’t know if this will fit your problem, but one way to store hierarchies in databases is with a quick “give me everything from this node and down function” to save the “path”.

For example, for a tree that looks like this:

  +-- b | a --+ +-- d | | +-- c --+ | +-- e 

you would save the lines as follows, assuming that the letter in the above tree is the "id" of each line:

 id path aa ba*b ca*c da*c*d ea*c*e 

To find all the descendants of a particular node, you should make a query "STARTSWITH" in the path column, i.e. all nodes with a path starting with a*c*

To find out if a particular node is a descendant of another node, you will see if the longest path was run with the shortest path.

So for example:

  • e is a descendant of a, since a*c*e starts with a
  • d is a descendant of c since a*c*d starts with a*c

Would this be useful in your instance?

+3
source

To move through the tree, the steps "depth of the tree" are required. Therefore, if you maintain a balanced tree structure, this proves that you will need O (log n) operations for your search operation. From what I understand, your tree looks special and you cannot maintain it in a balanced way, right? So O (n) will be possible. But this is bad when creating a tree anyway, so you are likely to die before using lookup anyway ...

Depending on how often you need this search operation compared to paste , you may decide to pay at the time of insertion to save additional composition data. I would suggest hashing if you really need to amortise O (1). With each insert operation, you put all the parents of the node in the hash table. According to your description, these can be O (n) elements in a given insert . If you do n inserts , this sounds bad (in the direction of O (n ^ 2)), but actually your tree cannot make it worse, so you probably get the amortized total size of H (n log n)). (In fact, the logarithmic part depends on the degree of degradation of your tree. If you expect that it will be the maximum degradation, do not do this.)

So, you pay about O (log n) for each insert and get the efficiency of the O (1) hash table for searching .

+2
source

For the M-way tree instead of your bitmap, why not just store the binary "trie id" (using M bits per level) with each node? For your example (assuming M == 2): A=0b01, B=0b0101, C=0b1001, ...

Then you can run the test in O (1):

 bool IsParent(node* child, node* parent) { return ((child->id & parent->id) == parent->id) } 

You can compress the storage bit at the ceil (lg2 ​​(M)) level to a level if you have a quick FindMSB () function that returns the position of the most significant bit:

 mask = (1<<( FindMSB(parent->id)+1) ) -1; retunr (child->id&mask == parent->id); 
+1
source

When passing in order, each set of descendants is adjacent. For your example

 ABDECF +---------+ A +---+ B + D + E +-+ C + F 

If you can do the preprocessing, then you just need to specify the node number and calculate the interval of the descendants.

If you cannot pre-process, then the link / cut tree offers O (log n) performance for both updates and queries.

+1
source

You can respond to a request of the form "Is node A a descendant of node B?" in constant time, just using two auxiliary arrays.

Pre-process the tree by visiting in the order of depth-first order, and for each node A, save the start and end time in the visit in the two arrays Start [] and End [].

So, let's say that End [u] and Start [u] are the final and initial time of visiting node u.

Then node u is a descendant of node v if and only if:

Start [v] <= Start [u] and End [u] <= End [v].

and you’re done, to check this condition requires only two searches in the Start and End arrays

0
source

Take a look at the nested set model. It’s very effective to choose, but update too slowly

0
source

All Articles