LINQ efficient graph traversal - eliminating recursion

Today I was going to implement a method for moving an arbitrarily deep graph and smooth it into one enumerated one. Instead, I first searched a bit and found this:

public static IEnumerable<T> Traverse<T>(this IEnumerable<T> enumerable, Func<T, IEnumerable<T>> recursivePropertySelector) { foreach (T item in enumerable) { yield return item; IEnumerable<T> seqRecurse = recursivePropertySelector(item); if (seqRecurse == null) continue; foreach (T itemRecurse in Traverse(seqRecurse, recursivePropertySelector)) { yield return itemRecurse; } } } 

In theory, this looks good, but in practice, I found that it works much worse than using the equivalent handwriting code (as the situation arises) to go through the chart and do whatever needs to be done. I suspect that this is due to the fact that in this method for each returned item the stack must be torn off to some arbitrarily deep level.

I also suspect that this method will work much more efficiently if recursion is eliminated. I am also not very good at recursion.

Does anyone know how to rewrite this method to avoid recursion?

Thanks for any help.

EDIT: Thanks so much for all the detailed answers. I tried to compare the original solution with Eric's solution and not use the enumerator method, and instead of doing it recursively intersecting with lambda and strangely, lambda recursion is much faster than either of the other two methods.

 class Node { public List<Node> ChildNodes { get; set; } public Node() { ChildNodes = new List<Node>(); } } class Foo { public static void Main(String[] args) { var nodes = new List<Node>(); for(int i = 0; i < 100; i++) { var nodeA = new Node(); nodes.Add(nodeA); for (int j = 0; j < 100; j++) { var nodeB = new Node(); nodeA.ChildNodes.Add(nodeB); for (int k = 0; k < 100; k++) { var nodeC = new Node(); nodeB.ChildNodes.Add(nodeC); for(int l = 0; l < 12; l++) { var nodeD = new Node(); nodeC.ChildNodes.Add(nodeD); } } } } nodes.TraverseOld(node => node.ChildNodes).ToList(); nodes.TraverseNew(node => node.ChildNodes).ToList(); var watch = Stopwatch.StartNew(); nodes.TraverseOld(node => node.ChildNodes).ToList(); watch.Stop(); var recursiveTraversalTime = watch.ElapsedMilliseconds; watch.Restart(); nodes.TraverseNew(node => node.ChildNodes).ToList(); watch.Stop(); var noRecursionTraversalTime = watch.ElapsedMilliseconds; Action<Node> visitNode = null; visitNode = node => { foreach (var child in node.ChildNodes) visitNode(child); }; watch.Restart(); foreach(var node in nodes) visitNode(node); watch.Stop(); var lambdaRecursionTime = watch.ElapsedMilliseconds; } } 

If TraverseOld is the original method, TraverseNew is Eric's method, and obviously lambda is lambda.

On my machine, TraverseOld takes 10,127 ms, TraverseNew takes 3,038 ms, lambda recursion takes 1,181 ms.

Is it typical that enumerator methods (with return of returns) may take 3X, rather than immediate execution? Or is something else happening here?

+8
c # graph linq recursion
source share
4 answers

Firstly, you are absolutely right; if the graph has n nodes of average depth d, then naive nested iterators give a solution that is O (n * d) in time, and O (d) on the stack. If d is a large part of n, then this can become an O (n 2 ) algorithm, and if d is large, you can completely pop the stack.

If you are interested in analyzing the performance of nested iterators, see the previous post by the Wes Dyer C # compiler developer:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

The dasblinkenlight solution is a variant of the standard approach. I usually write a program as follows:

 public static IEnumerable<T> Traverse<T>( T root, Func<T, IEnumerable<T>> children) { var stack = new Stack<T>(); stack.Push(root); while(stack.Count != 0) { T item = stack.Pop(); yield return item; foreach(var child in children(item)) stack.Push(child); } } 

And if you have several roots:

 public static IEnumerable<T> Traverse<T>( IEnumerable<T> roots, Func<T, IEnumerable<T>> children) { return from root in roots from item in Traverse(root, children) select item ; } 

Now note that the crawl is not what you want if you have a very interconnected graph or a cyclic graph! If you have a chart with down arrows:

  A / \ B-->C \ / D 

then bypass A, B, D, C, D, C, D. If you have a cyclical or interconnected schedule, then you want this transitive closure.

 public static IEnumerable<T> Closure<T>( T root, Func<T, IEnumerable<T>> children) { var seen = new HashSet<T>(); var stack = new Stack<T>(); stack.Push(root); while(stack.Count != 0) { T item = stack.Pop(); if (seen.Contains(item)) continue; seen.Add(item); yield return item; foreach(var child in children(item)) stack.Push(child); } } 

This change only yields items that were not previously received.

I am also not very good at recursion.

I wrote a series of articles on ways to eliminate recursion and recursive programming in general. If this topic interests you, see:

http://blogs.msdn.com/b/ericlippert/archive/tags/recursion/

In particular:

http://blogs.msdn.com/b/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx

+19
source share

You are right when walking trees and graphs recursively in code that yield return is a big source of inefficiency.

Typically, you rewrite a recursive code with a stack - similar to how it is usually implemented in compiled code.

I did not have the opportunity to try, but this should work:

 public static IEnumerable<T> Traverse<T>(this IEnumerable<T> enumerable, Func<T, IEnumerable<T>> recursivePropertySelector) { var stack = new Stack<IEnumerable<T>>(); stack.Push(enumerable); while (stack.Count != 0) { enumerable = stack.Pop(); foreach (T item in enumerable) { yield return item; var seqRecurse = recursivePropertySelector(item); if (seqRecurse != null) { stack.Push(seqRecurse); } } } } 
+7
source share

You can always eliminate recursion by replicating the basics of how recursion works with the stack.

  • put the first item at the top of the stack
  • While the stack is not empty, pull an item from the stack
  • if the current node has children, add them to the stack
  • Yield returns the current item.
  • Go to step 1!

Crazy smart theoretical answer: stack overflow

http://cs.saddleback.edu/rwatkins/CS2B/Lab%20Exercises/Stacks%20and%20Recursion%20Lab.pdf

+2
source share

You can use the queue in your code. A queue can be initialized as a list with one element equal to the top node. Then you should iterate over each element of the list, starting from the first. If the first element contains child nodes, you add them all to the end of the queue. Then move on to the next item. When you get to the end of the line, your schedule will be completely smoothed out.

0
source share

All Articles