It seems to me that you can get rid of most of the vertex grepping by tracking the end nodes when adding them to the graph and when processing the node list.
if( @dependencies ) { for my $dependency (@dependencies) { unless ($g->has_vertex($dependency)) { $g->add_vertex($dependency); } $g->add_edge($component, $dependency); } } else { push @end_components, $component; }
Then, when you process your dataset, start one thread of execution (however you implement your parallelism) for each end of the node. When the thread terminates, all parent nodes without other successors are added to the end of the node. Continue to process the end of the node until both the graphs and the node are empty.
sub run_in_parallel { my $g = shift->copy; my $end_vertices = shift; my @queue; while( $g->vertices ) { print "running ", join(", ", @queue), " in parallel\n"; for my $compenent (@queue) {
This change adds a bit of writing, and is a little more fragile than your original idea. Consider encapsulating these behaviors in an object that either contains or inherits from Graph .
The biggest performance benefit will be when using large dependency graphs. Minimal improvement will be observed with small graphs (where the cost of grepping is tiny compared to the cost of process control) or where the graphs are very small.
This code is not fully verified.
daotoad
source share