Comparison of relational databases and graphs

Question

Comparison of relational databases and graphs

Can someone explain to me the advantages and disadvantages of a relationship database like MySQL compared to a graphical database like Neo4j?

In SQL, you have several tables with different identifiers linking them. Then you need to join to join the tables. From a novice’s point of view, why should you create a database in order to require a connection, and not have explicit connections in the form of edges from the very beginning, as with a graph database. Conceptually, this would not make sense to a beginner. Presumably, there is a very technical, but not understandable reason for this?

+50

sql relational-database graph-databases

user782220 Oct 24

source share

4 answers

The key difference between a graph and a relational database is that relational databases work with collections, and graph databases work with paths.

This is manifested in unexpected and useless ways for the RDBMS user. For example, when you try to emulate the actions of a path (for example, friends of friends) by recursively merging into a relational database, query latency increases unpredictably and massively, as does memory usage, not to mention the fact that it tortures SQL to express such kinds of operations. More data means slower work in a set-based database, even if you can delay the pain through reasonable indexing.

As Dan1111 points out, most graph databases do not suffer from this joint pain because they express relationships on a fundamental level. That is, relationships physically exist on the disk, and they are called, directed, and can themselves be decorated with properties (this is called the property graph model, see https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model ). This means that if you decide, you can look at the relationships on the disk and see how they “join” to the objects. Thus, relations are first-class objects in the graph database and are semantically much stronger than the alleged relations confirmed at runtime in the relational storage.

So why should you care? For two reasons:

Graphical databases are much faster than relational databases for connected data — the strength of the underlying model. The consequence of this is that the delay in queries in the graph database is proportional to the part of the graph that you choose to study in the query, and is not proportional to the amount of data stored, thereby disabling joining the battle .
Graphical databases make modeling and querying more enjoyable, which means faster development and fewer WTF moments. For example, a friend-friend expression for a typical social network in the Neo4j Cypher query language is just MATCH (me)-[:FRIEND]->()-[:FRIEND]->(foaf) RETURN foaf .

For a more thorough assessment of the strengths of graph databases and relational repositories, there is a (free!) O'Reilly book entitled Graphical Databases (caveat: I am one of the authors of the book), available at http://graphdatabases.com . He will not be free forever, as he is O'Reilly's proper book, but we have permission from the publisher to give a lot, so get it now.

+78

Jim Webber Jul 30 '13 at 9:17

source share

Dan1111 already gave an answer marked as correct. A couple of additional points worth noting in passing.

Firstly, in almost all graphical database implementations, records are “pinned” because there are an unknown number of pointers pointing to the record at its current location. This means that the record cannot be shuffled to a new place without leaving the forwarding address in the same place or without breaking an unknown number of pointers.

Theoretically, you can immediately shuffle all entries and figure out a way to find and correct all pointers. In practice, this is an operation that can take several weeks in a large graph database, during which the database must be disconnected. It is simply not possible.

Unlike a relational database, records can be shuffled at a fairly large scale, and the only thing to do is rebuild any indexes that have been affected. This is a fairly large operation, but it almost does not exceed the equivalent for the graph database.

The second point is worth noting in passing that the World Wide Web can be seen as a gigantic graph database. Web pages contain hyperlinks, and hyperlinks link, among other things, to other web pages. The link is via URLs that act as pointers.

When a web page moves to a different URL without leaving a forwarding address to the old URL, an unknown number of hyperlinks will be violated. These broken links then generate the dangerous “Error 404: Page Not Found” message, which interrupts the pleasure of so many surfers.

+11

Walter Mitty Oct 26 '12 at 5:12

source share

With a relational database, we can model and query a graph using foreign keys and self-joins. Just because RDBMS contains the word relational does not mean that they handle relations well. The word relational in an RDBMS comes from relational algebra, not from a relation. In a DBMS, the connection itself does not exist as an object. It must either be explicitly represented as a foreign key, or implicitly as a value in the link table (when using the universal / universal approach to modeling). The relationships between data sets are stored in the data itself.

The more we increase the search depth in a relational database, the more self-connections we need to perform, and the more our query performance will suffer. The deeper we go into our hierarchy, the more tables we need to combine, and the slower our query. Mathematically, cost is growing exponentially in a relational database. In other words, the more complex our queries and relationships, the more we benefit from the graph compared to the relational database. When navigating the graph, we have no performance problems in the graph database. This is due to the fact that the graph database stores relationships as separate objects. However, superior read performance comes at the expense of slower writing.

In some situations, it is easier to change the data model in a graph database than in an RDBMS, for example. in an RDBMS, if I change the ratio of the table from 1: n to m: n I need to apply DDL with potential downtime.

RDBMS, on the other hand, has advantages in other areas, for example. aggregating data or performing controlled version control of data.

I discuss some other pros and cons in my blog post about graphical databases for data warehouses

+1

Uli Bethke Jun 16 '17 at 18:48

source share

dan1111 · Accepted Answer · 2012-10-24 09:51

Actually there is a conceptual argumentation of both styles. Wikipedia on the relational model and graphical databases gives good reviews of this.

The main difference is that in the graph database the relationships are stored at a separate record level, while in the relational database the structure is defined at a higher level (table definitions).

This has important implications:

A relational database is much faster when working on huge numbers of records. In the graph database, each record must be checked individually during the query to determine the structure of the data, while this is known in advance in the relational database.
Relational databases use less storage because they don’t have to keep all these relationships.

Saving all relationships at the individual record level makes sense only if there are many changes in the relationship; otherwise, you just repeat the same things over and over. This means that graph databases are well suited for irregular, complex structures. But in the real world, most databases require regular, relatively simple structures. This is why relational databases prevail.

Comparison of relational databases and graphs

More articles: