Is a graph database better for shortest path algorithms?

My goal is to write a shortest path algorithm for a road network.

Currently, my architecture looks something like this: I store all the data in a PostgreSQL database with PostGIS enabled. I make one SELECT * FROM ways , which takes less than 3 seconds in a table with 100,000 edges (paths), and after that I will apply the shortest path algorithm (Java, Ruby or nothing) on ​​a graph that is already in memory. The second operation can take about 1.5 seconds on a graph with 100,000 edges.

So he accepts:

  • 2-3 seconds to load all the paths from the database into memory and create a graph (nodes are stored in one table using paths (edges));
  • 1-1.5 seconds to calculate the shortest path on a graph that is already in memory.

This is very similar to what pgRouting does (as far as I know, it uses C Boost to store the graph in memory), except that pgRouting only takes about 2 seconds to calculate the shortest path in the same dataset (yes, this fast, but for me it's a black box, so I need my own).

But recently I found about Graph databases and about Neo4j. On their website, they argue that "while still being able to do these calculations at secondary speeds on graphs of millions of roads and waypoints, in many cases you can abandon the usual approach to precomputer indexes with K / V stores and be able to put routing in a critical way with the ability to adapt to living conditions and create highly personalized and dynamic spatial services. "

So the question is: will the graph database be faster with my specific problem?

The problem has the following properties:

  • the database consists of one table (path);
  • The only query to the database is to get all the paths in memory (build a graph);
  • I do not need scalability, i.e. it is likely that the schedule will not grow.
+7
source share
4 answers

Of course, you don’t need to reinvent the wheel if you use any graph database, such as Neo4j. Many shortest path algorithms are built into it, which are designed to handle difficulties if you need to consider the speed limit on any particular road, one-way road, counting track, etc. How do you keep up with performance when your data grows? times, or, 100 times. Given your total computation time of 3 seconds for 100,000 methods, it may be in minutes for 1M paths, and in Neo4j the answer will be in milliseconds.

+2
source

I have no experience with graph databases, but judging by your question, I mean a few things.

First of all, the direct answer will be "Create such a graph database and compare performance with your solution." You can measure memory usage, runtime (speed), processor usage, and / or possibly other indicators. This will provide you with enough data to make your decision.

My other advice is to reconsider your method. The three properties of the problem that you described (one table, loading all paths and no need for scalability) apply in your current domain, but not in the graph database. This is a completely different programming paradigm, and you may have to adapt and adapt your method to suit the field of these special types of databases. It is impractical to perform performance or any other comparisons if you use the standard approach in a non-standard environment (for example, this graph database).

Summary: translate your problem into the conditions of the graph database and configure it accordingly. After that, do a performance comparison between the two solutions.

My bet is that if you translate and model your problem in an appropriate way for a graph database, it will give you better performance. Your classic store-read-sort approach is simple, but not nearly as effective unless it is optimized aggressively.

+1
source

A break with graph databases is not only performance, but also more about the concept: your routing algorithms deal with <strong> single relational graphs (that is, a graph is a link of the same type), while using graphdatabases you have a multi- relational graph .

This allows you to calculate the shortest path between nodes that use only a certain type of edge or to avoid another type.

For more information, you should read the algebra behind the db graph and the concept of pipes.

I highly recommend the thinkerpop project to start with a graph database.

+1
source

The graph database will probably not load all your data into memory initially, but over time, since good ones are designed to work with extremely large data sets. However, once the data is there, the graph database should do less work for the relational database to follow the links. This is because it can directly access related objects using its identifiers, instead of using B-tree indexes and (possibly) a join table, so it should be faster after caching nodes and edges.

0
source

All Articles