Neo4j Merge does not use a unique constraint index

Neo4j Version 2.2.4

I use LOAD CSV to import a huge collection of nodes and links. I use MERGE to create or create nodes. For performance, I also created a unique index for the node property.

CREATE CONSTRAINT ON (e:RESSOURCE) assert e.url is unique; USING PERIODIC COMMIT 10000 LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t' MERGE (subject:RESSOURCE {url: trim(line[0])}) MERGE (object:RESSOURCE {url: trim(line[1])}) CREATE (subject)-[:EQUIVALENCE]->(object); 

The problem is that imports are around 1Mio. the edges are very bad. I have profiled import as well as individual MERGE queries, and I have not seen any use of a unique index. In contrast, a MATCH query uses an index. What can I do to use MERGE with an index?

+4
source share
2 answers

Peter is right for one more explanation:

You have encountered the EAGER problem: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ you should see it on the EXPLAIN output (delete the periodic commit and use the explanation)

 +--------------+----------------------------------+-----------------------+ | Operator | Identifiers | Other | +--------------+----------------------------------+-----------------------+ | +EmptyResult | | | | | +----------------------------------+-----------------------+ | +UpdateGraph | anon[179], line, object, subject | CreateRelationship | | | +----------------------------------+-----------------------+ | +UpdateGraph | line, object, subject | MergeNode; :RESSOURCE | | | +----------------------------------+-----------------------+ | +Eager | line, subject | | | | +----------------------------------+-----------------------+ | +UpdateGraph | line, subject | MergeNode; :RESSOURCE | | | +----------------------------------+-----------------------+ | +LoadCSV | line | | +--------------+----------------------------------+-----------------------+ 

Eager will pull your entire CSV file to provide isolation and effectively disable your periodic commit.

If you make two passes, you can also try:

 CREATE CONSTRAINT ON (e:RESSOURCE) assert e.url is unique; USING PERIODIC COMMIT 10000 LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t' FOREACH (url in line[0..1] | MERGE (subject:RESSOURCE {url: trim(url)}) ); USING PERIODIC COMMIT 10000 LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t' MATCH (subject:RESSOURCE {url: trim(line[0])}) MATCH (object:RESSOURCE {url: trim(line[1])}) CREATE (subject)-[:EQUIVALENCE]->(object); 
+5
source

Try the following:

 MATCH (subject:RESSOURCE {url: trim(line[0])}), (object:RESSOURCE {url: trim(line[1])}) MERGE (subject)-[:EQUIVALENCE]->(object) 

Edit: I see you also want to merge nodes - I would suggest doing MERGE for each node:

 MERGE (subject:RESSOURCE {url: trim(line[0])}) 

I also recommend cropping when you create a csv file to limit the number of times neo4j does this and simplifies this cypher.

Edit 2 (thanks to commentator Kai who corrected my MERGE expression):

If you want to make a more complex MERGE with more properties, you can do this:

 MERGE (subject:RESSOURCE {url: trim(line[0])}) ON CREATE SET source=trim(line[1]) ON MERGE SET source=trim(line[1]) 
+3
source

All Articles