What is the fastest implementation for a problem with paired shortcuts?

I have a weighted graph of 30k nodes of 160k edges, no negative weights. I would like to calculate all the shortest paths from all nodes to the rest. I think that I cannot accept any specific heuristics to simplify this problem.

I tried to use this implementation of Dijkstra C http://compprog.wordpress.com/2007/12/01/one-source-shortest-path-dijkstras-algorithm/ , that is, for one shortest problem calling the dijkstras () function for everything my 30 knots. As you can imagine, this takes a lot of time. At the moment, I don’t have time to write and debug the code myself, I have to calculate these paths as soon as possible and save them in the database, so I'm looking for another quick solution, ready to use, do you have any tips?

I need to run it on a recent macbook pro with 8 GB of RAM, and I would like to find a solution that takes no more than 24 hours to complete.

Thank you very much in advance!

Eugenio

+7
source share
5 answers

I looked at the Dijkstra algorithm link that you posted in the comments, and I think this is the source of your inefficiency. Inside the Dijkstra inner loop, he uses an extremely non-optimized approach to determine which node should be explored next (linear scan through each node at each step). The problem code is in two places. The first is code that tries to find the following node to work:

mini = -1; for (i = 1; i <= n; ++i) if (!visited[i] && ((mini == -1) || (d[i] < d[mini]))) mini = i; 

Since this code is nested inside the loop that each node visits, the complexity (as mentioned in the link) is O (| V | 2 ), where | V | - number of nodes. In your case, since | V | is 30,000, there will be nine hundred million iterations of this cycle as a whole. It is very slow (as you saw), but there is no reason to do this work.

Here is another problem that tries to find which edge in the graph should be used to reduce the cost of other nodes:

 for (i = 1; i <= n; ++i) if (dist[mini][i]) if (d[mini] + dist[mini][i] < d[i]) d[i] = d[mini] + dist[mini][i]; 

This scans the entire row in the adjacency matrix, looking for nodes to consider, which takes O (n) time no matter how many outgoing edges come out of the node.

Although you can try to fix this version of Dijkstra into a more optimized implementation, I think the correct option here is to simply throw away this code and find a better implementation of Dijkstra's algorithm. For example, if you use the pseudocode from a Wikipedia article implemented with a binary heap , you can get the Dijkstra algorithm running in O (| E | log | V |). In your case, this value is just over two million, which is about 450 times your current approach. This is a huge difference, and I bet that with a better implementation of Dijkstra you will get code that ends much shorter than before.

Alternatively, you might consider running all Dijkstra searches in parallel, as pointed out by Jacob Eggers. This camera gives you an extra speed boost for every processor you have. Combined with the above (and more critical) fix, this is likely to give you a huge increase in performance.

If you plan to run this algorithm on a much denser data set (in which the number of edges approaches | V | 2 / log | V |), then you might want to switch to the Floyd-Warshall algorithm . Running Dijkstra's algorithm once per node (sometimes called Johnson's algorithm ) takes O (| V || E | log | V |) time, while using Floyd-Warshall it takes O (| V | 3 ) time. However, for the dataset that you mentioned, the graph is pretty sparse that running multiple instances of Dijkstra should be great.

Hope this helps!

+12
source

What about the Floyd-Warshall algorithm ?

+3
source

Does your schedule have any special structure? Is the chart flat (or nearly so)?

I would recommend not trying to store all the shortest paths, a fairly dense encoding (30k ^ 2 "where to go next") will take 7 gigabytes of memory.

What is an app? Are you sure that performing bidirectional Dijkstra (or A * if you have a heuristic) will not be fast enough when you need to find a specific shortest path?

+2
source

If you can change the algorithm to multithreading, you can complete it in less than 24 hours.

The first node may take more than 1 minute. However, the 15,000th node should only pass half this time, because you would calculate the shortest paths to all previous nodes.

0
source

A bottleneck could be your data structure in which you use storage paths. If you use too much storage, you will soon run out of cache and memory, which will speed up working with a fast algorithm, because it gets about 100 (cache miss) or 10000+ (replaced pages) constant multiplier.

Since you must store the paths in a database, I suspect this could easily be a bottleneck. It is probably best to try to first generate memory paths with a very efficient storage mode, for example, N bits per vertex, where N == maximum number of edges per vertex. Then set a bit for each edge that you can use to create one of the shortest paths. After generating this path information, you can run a recursive algorithm that generates path information in a format suitable for storing the database.

Of course, the database is still the most likely bottleneck. You want to think very carefully which format you use to store information, because inserting, searching, and modifying large datasets in an SQL database is very slow. In addition, using transactions to perform database operations can reduce the burden of writing to the disk if the database engine controls multiple inserts for the write operation to one disk.

Better yet, get simple results in the cache cache and discard decisions if they are no longer needed. Then generate the same results again if you need them. This means that you only generate paths on demand when you need them. The runtime for 30k nodes and 160k edges should be clearly below a second for one entire Dijkstra shortest path.

For the shortest path algorithms, I always chose C ++. There should be no reason why the C implementation would not be simple, but C ++ offers abbreviated coding using STL containers that can be used in the initial implementation, and only later implement an optimized queue algorithm if tests and profiling indicate that it is necessary have something better than STL offers.

 #include <queue> #include "vertex.h" class vertex; class edge; class searchnode { vertex *dst; unsigned long dist; public: searchnode(vertex *destination, unsigned long distance) : dst(dst), dist(distance) { } bool operator<(const searchnode &b) const { /* std::priority_queue stores largest value at top */ return dist > b.dist; } vertex *dst() const { return dst; } unsigned long travelDistance() const { return dist; } }; static void dijkstra(vertex *src, vertex *dst) { std::priority_queue<searchnode> queue; searchnode start(src, 0); queue.push(start); while (!queue.empty()) { searchnode cur = queue.top(); queue.pop(); if (cur.travelDistance() >= cur.dst()->distance()) continue; cur.dst()->setDistance(cur.travelDistance()); edge *eiter; for (eiter = cur.dst()->begin(); eiter != cur.dst()->end(); eiter++) { unsigned nextDist = cur.dist() + eiter->cost(); if (nextDist >= eiter->otherVertex()) continue; either->otherVertex()->setDistance(nextdist + 1); searchnode toadd(eiter->othervertex(), nextDist); queue.push(toadd); } } } 
0
source

All Articles