On a graph, how to calculate the sum of all nodes that a node can reach efficiently?

The specified directed graph with the weight assigned to each node.
Start with any node A, there will be a set of nodes that can be reached from A.
Define SUM as the total weight of this set.

Question: How to calculate the SUM for each node in this graph efficiently?
A graph can contain millions of nodes.

Update 1:
The graph data structure consists of a set of starting nodes, and each node has a set of point nodes.

What I tried:
Calculate the descendants of each node recursively and calculate the SUM from the set of descendants. This recursion is very inefficient, since I have to repeatedly execute the union, uteration command. In addition, I tried to cache a set of children on each node. However, he easily runs out of memory.
So is there any other solution?

Update 2:
Graph example, all edges are directed, upper nodes point to lower nodes
Sample graph
The result should be SUM (1) = 55 SUM (2) = 35 SUM (3) = 41 SUM (4) = 19 SUM (5) = 22 SUM (6) = 25 ...

+7
source share
2 answers

Locate the SCC graph (Tarjan algorithm or DFS dual run).

For each SCC, calculate the sum of its node weights; denote this value as PARTIAL-SUM.

SCC iterations in reverse topological order; for each node in each SCC, its SUM will be the sum of all the SUM values ​​of the neighboring SCC plus its own PARTIAL-SUM value.

The linear run time is O(E+V) , since the SCC search is linear, the topological type is linear, and the summation is linear, since we visit each SCC no more than once and each branch no more than once.

EDIT

As noted in tzkuzzy's comments, parallel paths create a problem. This is easily solved by simply running the DFS on the SCC chart. At any intersection, we simply take the already visited node DFS tree until we reach the not-completely distorted parent, this pair of nodes (visited below and the ancestor) has two different paths between them, we make a list for each node of such descendants and when summing, we simply subtract their PARTIAL-SUM value.

So if:

  u / \ \ / w 

Our DFS will take the cross-border connecting the node child element from u to w and return to u (for those familiar with the typical DFS taught in schools, the simplest explanation is that u characterized as the first gray ancestor of w ), therefore we add w to the list that we support on u .

Then, when we summarize each SCC-neighboring SCC, as described, we add an extra step in which we iterate over the list and simply subtract the PARTIAL-SUM values.

DFS itself is still linear. The rollback from node to the ancestor can be linear if we cache the results (thus, we do not cross the same edge more than once). And the additional work in the summation is not more than O(V) , so we did not change the working time.

EDIT

Inclusion-exclusion makes this more difficult than I thought. This solution is not complete and does not work. A simple BFS for each node is more expensive, but simpler and will work.

+10
source

My previous answer is a good idea, but including - excluding intersecting subtrees is very difficult to solve (something you think your recursive approach will also suffer from). Think about it, I am starting to doubt that there is even an optimality substructure in this problem that gives a dynamic programming solution ...

I suggest a much simpler (albeit less efficient) solution:

For each node, run BFS / DFS and summarize the values ​​of all the nodes found. It will be executed in O(V) space and O(V(E+V)) = O(EV) time, but it is very simple to implement and does not need recursive work.

You can optimize by finding the SCC graph and instead of doing BFS / DFS on each node do it once for each SCC. This can increase uptime as a huge factor if the graph is "cliquey".

0
source

All Articles