Dijkstra's Algorithm: Memory Consumption

I have a Dijkstra Algorithm implementation based on the code on this website . Basically, I have several nodes (say, 10000), and each node can have from 1 to 3 connections to other nodes.

Nodes are randomly generated in three-dimensional space. Connections are also randomly generated, however, he always tries to first find connections with the nearest neighbors and slowly increases the search radius. Each connection has a distance of up to one. (I doubt it all matters, but it's just the background).

In this case, the algorithm is used only to search for the shortest number of transitions from the starting point to all other nodes. And it works well for 10,000 nodes. The problem is that, as the number of nodes increases, say, to 2 million, I use the entire memory of my computers when trying to build a graph.

Does anyone know of an alternative way to implement an algorithm to reduce memory or is there another algorithm that uses less memory?

+7
source share
3 answers

According to your comment above, you represent the edges of the graph using the distance matrix long dist[GRAPHSIZE][GRAPHSIZE] . This takes up O(n^2) memory, which is too large for large n values. This is also not a good idea in terms of runtime when you have only a small number of edges: this will cause Dijkstra's algorithm to take O(n^2) (where n is the number of nodes) when it can potentially be faster, depending on the data structures used.

Since in your case you said that each node is connected only to three other nodes, you should not use this matrix: instead, for each node you should save the list of nodes to which it is connected.Then, when you want to go through the neighbors of the node, you just need to iterate over this list.

In some specific cases, you donโ€™t even need to save this list, because it can be calculated for each node when necessary. For example, when a graph is a grid, and each node is connected to neighboring grid nodes, it is easy to find the neighbors of a node on the fly.

+7
source

If you really cannot afford the memory , even with the minimizations in your graphical representation , you can develop the Dijkstra algorithm algorithm , taking into account the divide and win .

The idea is to divide the data into secondary fragments, so that you can execute the Dijkstra algorithm in each fragment for each of the points inside it.

For each solution generated in these junior fragments, consider it as a unique node for another piece of data from which you will start another Dijkstra run.

For example, consider the following points:

 .B .C .E .A .D .F .G 

You can select the nearest points for a given node, say, for two jumps, and then use the solution as part of an extended graph, considering the previous points as only one set of points with a distance equal to the resulting distance of the Dijkstra solution.

Say you start with D :

  • select closest points - D within the specified number of hops ;
  • use Dijkstra's algorithm for selected records, starting with D ;
  • use a graphical solution with a central node D and the last nodes in the shortest paths as nodes directly connected to D ;
  • we expand the graph by repeating the algorithm until all nodes are considered.

Although there is expensive additional processing here , you can overcome the memory limit , and if you have other machines, you can even distribute the processes.

Please note that this is just a process idea; the process I described is not always the best way to do this. You may find something interesting in the search for Dijkstra's distributed algorithm.

+3
source

I like boost :: graph. The memory consumption is very decent (I used it on road networks with 10 million nodes and 2 GB of RAM).

It has a Dijkstra implementation, but if the goal is to implement and understand it yourself, you can still use their graph representation (I offer an adjacency list) and compare your result with them to make sure your result is correct.

Some people mentioned other algorithms. I do not think that this will play a big role in the use of memory, but rather in speed. 2M, if the topology is close to the network, the run time will be less than a second from one node to all the others.

http://www.boost.org/doc/libs/1_52_0/libs/graph/doc/index.html

+1
source

All Articles