My goal is to write a shortest path algorithm for a road network.
Currently, my architecture looks something like this: I store all the data in a PostgreSQL database with PostGIS enabled. I make one SELECT * FROM ways , which takes less than 3 seconds in a table with 100,000 edges (paths), and after that I will apply the shortest path algorithm (Java, Ruby or nothing) on ββa graph that is already in memory. The second operation can take about 1.5 seconds on a graph with 100,000 edges.
So he accepts:
- 2-3 seconds to load all the paths from the database into memory and create a graph (nodes are stored in one table using paths (edges));
- 1-1.5 seconds to calculate the shortest path on a graph that is already in memory.
This is very similar to what pgRouting does (as far as I know, it uses C Boost to store the graph in memory), except that pgRouting only takes about 2 seconds to calculate the shortest path in the same dataset (yes, this fast, but for me it's a black box, so I need my own).
But recently I found about Graph databases and about Neo4j. On their website, they argue that "while still being able to do these calculations at secondary speeds on graphs of millions of roads and waypoints, in many cases you can abandon the usual approach to precomputer indexes with K / V stores and be able to put routing in a critical way with the ability to adapt to living conditions and create highly personalized and dynamic spatial services. "
So the question is: will the graph database be faster with my specific problem?
The problem has the following properties:
- the database consists of one table (path);
- The only query to the database is to get all the paths in memory (build a graph);
- I do not need scalability, i.e. it is likely that the schedule will not grow.
skanatek
source share