Batch insert with Neo4j

I import 2.3 billion relationships from the table. Import is not very fast at a speed of 5 million per hour, which takes 20 days to complete the migration. I heard about batch insert neo4j and batch insert utility . The utility does interesting things by importing from a csv file, but the last code is how it is broken and does not work.

I have about 100M relationships in neo4j and I have to check that there should be no duplicate relationships.

How can I fast in neo4j

Current code is similar to

begin transaction for 50K relationships create or get user node for user A create or get user node for user B check there is relationship KNOW between A to B if not create the relationhsip end transaction 

I also read the following:

+4
source share
2 answers

in the case of relationships, and suppose you have enough memory, I would try not to create unique relationships at the import stage - right now I'm actually importing an SQL table with ~ 3 million records, but I always create relationships and it doesn’t matter if it is a duplicate or no.

you can later, after importing, simply execute a cypher request that will delight a unique relationship like this:

 START n=node(*) MATCH n-[:KNOW]-m CREATE UNIQUE n-[:KNOW2]-m; 

and

 START r=rel(*) where type(r)='KNOW' delete r; 

at least this is my approach, and launching a later cypher request takes about a minute. the problem may be when you really have two node nodes, than the cypher request can get into a memory error (it depends on how much cache you configured for the neo4j mechanism)

+3
source

How do you do "get user node for user A", searching by index? Index search really slows down batch inserts. Try to cache as many users as possible in a simple HashMap "in front" of the index or use BatchInserterIndex # setCacheCapacity

0
source

All Articles