Does this mean that it makes sense to map the graph data structure to a relational database?

Question

Does this mean that it makes sense to map the graph data structure to a relational database?

In particular, Multigraph .

Some colleague suggested this, and I am completely puzzled.

Any ideas on this?

+6

computer-science database graph-theory

Rodrogo Dec 30 '10 at 17:13

source share

4 answers

This is an acceptable approach. You need to think about how this information will be processed. Most likely, you will need a separate language from your database to perform the calculations associated with the graphs that this data type implies. The Skiena Algorithm Development Guide contains extensive partition graph data structures and their manipulations.

Without considering what types of queries you can execute, start with two tables vertices and edges . The vertices are simple, identifier and name. The edges are complex given the multigraph. The edges must be uniquely identified by a combination of two vertices (i.e., Foreign keys) and some additional information. Additional information depends on the problem you are solving. For example, if the flight information, departure and arrival time and airline. In addition, you will need to decide whether the edge is (i.e., one way) or not, and monitor this information.

Depending on the calculations, you may encounter a problem that is better solved using some kind of artificial intelligence / machine learning algorithm. For example, optimal flights. The book Programming Collective Intelligence has some useful algorithms for this purpose. But where the data is stored, the algorithm itself does not change.

+2

orangepips Dec 30 '10 at 21:54

source share

Well, information needs to be stored somewhere, a relational database is not a bad idea.

It will be just a many-to-many relationship, a node list table and an edge / join list table.

+1

Benjamin anderson Dec 30 '10 at 17:22

source share

See how Facebook can implement social schedules in its database. They may have a table for people and another table for friendships. The friendship table has at least two columns, each of which is a foreign key to the people table.

Since friendships are symmetrical (on Facebook), they can ensure that the identifier for the first foreign key is always less than the identifier for the second foreign key. Twitter has an oriented schedule for its social network, so it will not use a canonical representation like this.

0

Jake mitchell Dec 30 '10 at 17:29

source share

Tom anderson · Accepted Answer · 2010-12-30T22:16:53+0000

It is very convenient to store the graph in the database: you have a table for nodes and a table for edges, which acts as a many-to-many relationship table between a node table and itself. Like this:

create table node ( id integer primary key ); create table edge ( start_id integer references node, end_id integer references node, primary key (start_id, end_id) );

However, there are a few sticky points regarding the storage of the chart this way.

First, the edges of this pattern are naturally directed — the beginning and the end are different. If your edges are not oriented, you will have to either be careful when writing queries, or store two records in a table for each edge, one in any direction (and then be careful when writing queries!). If you are storing one edge, I would suggest normalizing the saved form - perhaps always consider the node with the smallest identifier, which should be the initial one (and add a check constraint to the table to ensure it runs). You may have a truly disordered view if it has no edges, but they have a table of connections between them, but for me this does not seem like a great idea.

Secondly, the above diagram does not have the ability to represent multigraphy. You can expand it easily enough to do this; if the edges between a given pair of nodes are indistinguishable, it is easiest to add an account to each line of the edge, indicating how many edges exist between the mentioned nodes. If they are distinguishable, you will need to add something to the node table so that they can be distinguished - the identifier of the auto-generated edge may be the simplest.

However, even after disassembling the storage, you had a problem working with the schedule. If you want to do all your processing on objects in memory, and the database is intended solely for storage, then there is no problem. But if you want to make queries on a chart in a database, you will need to figure out how to do it in SQL, which does not have built-in support for charts and whose basic operations are not easy to adapt to working with charts. This can be done, especially if you have a database with recursive SQL support (PostgreSQL, Firebird, some of the proprietary databases), but this requires some thought. If you want to do this, my suggestion would be to post additional questions about specific queries.

Does this mean that it makes sense to map the graph data structure to a relational database?

More articles: