How do you iterate over a tree?

What is your preferred method for moving the tree data structure, as in some cases calls to recursive methods can be quite inefficient. I just use a generator like the one above. Do you have tips to make it faster?

def children(self): stack = [self.entities] while stack: for e in stack.pop(): yield e if e.entities: stack.append(e.entities) 

Here are some test data. The first is recursive, the second uses a generator:

 s = time.time() for i in range(100000): e.inc_counter() print time.time() - s s = time.time() for i in range(100000): for e in e.children(): e.inc_counter_s() print time.time() - s 

Results:

 0.416000127792 0.298999786377 

Test code:

 import random class Entity(): def __init__(self, name): self.entities = [] self.name = name self.counter = 1 self.depth = 0 def add_entity(self, e): e.depth = self.depth + 1 self.entities.append(e) def inc_counter_r(self): for e in self.entities: e.counter += 1 e.inc_counter_r() def children(self): stack = [self.entities] while stack: for e in stack.pop(): yield e if e.entities: stack.append(e.entities) root = Entity("main") def fill_node(root, max_depth): if root.depth <= max_depth: for i in range(random.randint(10, 15)): e = Entity("node_%s_%s" % (root.depth, i)) root.add_entity(e) fill_node(e, max_depth) fill_node(root, 3) import time s = time.time() for i in range(100): root.inc_counter_r() print "recursive:", time.time() - s s = time.time() for i in range(100): for e in root.children(): e.counter += 1 print "generator:", time.time() - s 
+4
source share
8 answers

I can’t come up with any big algorithmic improvements, but the simple microoptimization you can do is bind commonly called methods (e.g. stack.append / stack.pop) to local ones (this saves the dictionary search)

 def children(self): stack = [self.entities] push = stack.append pop = stack.pop while stack: for e in pop(): yield e if e.entities: push(e.entities) 

This gives a slight (~ 15%) acceleration of my tests (using 100 tree walks with 8 depths with 4 children in each node, gives me the following timings :)

 children : 5.53942348004 children_bind: 4.77636131253 

Not huge, but worth doing if speed is important.

+5
source

If your tree is really large or you have really high (real) speed requirements, I would choose the recursive method. Easier to read, easier to code.

+5
source

I'm not sure that you can significantly reduce the overhead by completely traversing the tree in order, if you use recursion, the call stack will grow, otherwise you must manually use the stack to click on child links when visiting each node. Which method is the fastest and uses less memory depends on the high cost of the call stack and the regular stack. (I would suggest that callstack is faster, as it must be optimized for its use, and recursion is much easier to implement)

If you don't like the order you visit the nodes, some tree implementations are actually stored in a dynamic array or a linked list or stack, which you can move linearly if you don't need the order that it went through.

But why is it so important to have a quick detour? Trees are good for searching, arrays / linked lists are good for full traversal. If you often need a full crawl in order, but a few searches and inserts / exceptions, an ordered linked list may be the best, if the search is what you do most, you use a tree. If the data is really massive, so memory overhead can make recursion impossible, you should use a database.

+4
source

Recursive function calls are not incredibly inefficient; this is an old programming myth. (If they are poorly implemented, they may entail more overhead than necessary, but calling them “incredibly ineffective” is simply wrong).

Remember: do not optimize prematurely and never optimize without benchmarking.

+4
source

If you have a lot of RAM, and the tree does not change often, you can cache the result of the call:

 def children(self): if self._children_cache is not None: return self._children_cache # Put your code into collectChildren() self._children_cache = self.collectChildren() return self._children_cache 

Whenever the tree changes, set the cache to None. In this case, the use of recursive calls can be more efficient, since the results will accumulate faster.

+3
source

In the past, I wrote an iterative tree traversal code: it is very ugly, not fast, unless you know for sure how many children not only each subtree will have, but also the number of levels.

+1
source

I don’t know too much about Python's internal functions function calls, but I really can’t imagine that your piece of code is faster than traversing a tree recursively.

The call stack (used for function calls, including recursive ones) is usually very fast. Going to the next object will cost you only one function call. But in your fragment where you use the stack object, the transition to the next object will cost you stack.append (possibly allocating memory on the heap), stack.push (possibly freeing memory from the heap) and profitability.

The main problem with recursive calls is that you can explode the stack if your tree gets too deep. This is unlikely to happen.

0
source

Here are a couple of small corrections.

 def children(self): stack = [self.entities] for e in stack: yield e if e.entities: stack.extend(e.entities) 

I really think that the generator, using append, does not visit all nodes. I think you mean extend stack with all entities, not append simple list of entities on the stack.

Also, when the for loop ends, the while in your original example will also end because there are no changes to the empty stack after the for loop.

0
source

All Articles