Significant performance difference between neo4j direct access and through OGM

Question

Significant performance difference between neo4j direct access and through OGM

I evaluate the performance of the Neo4j graphic base with a simple test to insert, update, delete, and query. Using Neo4j OGM, I see significantly slower execution time (about 2-4 times) compared to direct access through the Neo4j driver. For example, the delete operation (see code below) is performed in 500 ms versus 1200 ms for 10K nodes and 11K relationships on my machine. I wonder why this happens, especially because the code below does not even use any node object for deletion. I can imagine that OGM has some overhead, but that seems too big. Does anyone have an idea why it is slower?

Node example:

public abstract class AbstractBaseNode { @GraphId @Index(unique = true) private Long id; public Long getId() { return id; } } @NodeEntity public class Company extends AbstractBaseNode { private String name; public Company(String name) { this.name = name; } public String getName() { return name; } public void setName(String name) { this.name = name; } }

Sample code to remove using the driver:

 driver = GraphDatabase.driver( "bolt://localhost:7687", AuthTokens.basic( "neo4j", "secret" ) ); session = driver.session(); long start = System.nanoTime(); session.run("MATCH (n) DETACH DELETE n").list(); System.out.println("Deleted all nodes " + ((System.nanoTime() - start) / 1000) + "μs");

Sample code for removal via OGM:

 public org.neo4j.ogm.config.Configuration neo4jConfiguration() { org.neo4j.ogm.config.Configuration config = new org.neo4j.ogm.config.Configuration(); config.autoIndexConfiguration().setAutoIndex(AutoIndexMode.DUMP.getName()); config.driverConfiguration() .setDriverClassName("org.neo4j.ogm.drivers.bolt.driver.BoltDriver") .setURI("bolt://neo4j: secret@localhost ") .setConnectionPoolSize(10); return config; } sessionFactory = new SessionFactory(neo4jConfiguration(), "net.mypackage"); session = sessionFactory.openSession(); long start = System.nanoTime(); session.query("MATCH (n) DETACH DELETE n", Collections.emptyMap()).forEach(x -> {}); System.out.println("Deleted all nodes " + ((System.nanoTime() - start) / 1000) + "μs");

+7

java neo4j neo4j-ogm

Steffen harbich May 11 '17 at 9:10

source share

1 answer

Tezra · Answer 1 · 2017-05-19T17:51:29+0000

To begin with, your test samples are bad. When taking a sample of time, you want to emphasize the system so that it takes a lot of time. Tests should also check what interests you (do you check how quickly you can create and delete connections? Max Cypher via put? Speed of a single large transaction?) With tests that are barley per second, it is impossible to determine performance is a request call or just the overhead of starting up (despite the name, the session doesn't actually connect until you call the request (...)).

As far as I can tell, both versions work approximately the same in normal setup. The only thing I can think of is that it will affect it if OSGM does something to starve other system resource processes.

UPDATE

 UNWIND {rows} as row CREATE (n:Company) SET n=row.props RETURN row.nodeRef as ref, ID(n) as id, row.type as type with params {rows=[{nodeRef=-1206180304, type=node, props={name=company_1029}}]}

VS

 CREATE (a:Company {name: {name}}) // X10,000

The key difference between the driver and OGM is that the driver does exactly what you tell it, which is the most efficient way to do things; and OGM is trying to manage the request logic for you (what to return, how to save things, what to try to save). And the OGM version is more reliable because it will automatically try to merge the nodes into a database (if possible) and save only those things that have really changed. Since your node class does not have a primary key for consolidation, it will have to create everything. OGM Cypher is more versatile, but also requires more memory usage / access. SET n.name="rawr" - 1 dB for each property. SET n={name:"rawr"} takes 3 dB, though (about 1 + 2 * # _ of_props. {Name: "rawr", id: 2} - 5 dB hits). This is why OGM Cypher is slower. However, OGM has intelligent control, so if you edit one node with something and try to save it, the driver will either have to save everything, or you will have to implement your own manager. OGM will only keep the updated version.

In short, OGM Cyphers are less efficient than what you write with the driver, but OGM has built-in intelligent control that can make it faster than implementing a hidden driver in real business logical situations (loading / editing a large number of nodes). Of course, you can implement your own driver controls to be faster, so this is a compromise of speed and development efforts. The more you want, the more time you have to manage every tiny aspect (and the OGM point is to connect it, and it just works).

Significant performance difference between neo4j direct access and through OGM

More articles: