I need to insert a huge number of nodes with the relationships between them in Neo4j through the Ratch API Batch endpoint, about 5 thousand records / s (still increasing).
This will be a 24x7 continuous insert. Each entry may need to create only one node, but for another it may be necessary to create two nodes and create one link.
Can I improve insert performance by changing my procedure or changing my Neo4j settings?
My progress:
1. I tested Neo4j for a while, but I couldn’t get the required performance
Test server server: 24 cores + 32 GB RAM
Neo4j 2.0.0-M06 is installed as a standalone service.
Running my Java application on the same server (in the future, Neo4j and Java applications will have to run on their own server, so the built-in mode cannot be used)
REST API endpoint: / db / data / batch (target: / cypher)
Using Schema Index, Constraints, MERGE, CREATE UNIQUE.
2. My scheme:
neo4j-sh (0)$ schema
==> Indexes
==> ON :REPLY(created_at) ONLINE
==> ON :REPLY(ids) ONLINE (for uniqueness constraint)
==> ON :REPOST(created_at) ONLINE
==> ON :REPOST(ids) ONLINE (for uniqueness constraint)
==> ON :Post(userId) ONLINE
==> ON :Post(postId) ONLINE (for uniqueness constraint)
==>
==> Constraints
==> ON (post:Post) ASSERT post.postId IS UNIQUE
==> ON (repost:REPOST) ASSERT repost.ids IS UNIQUE
==> ON (reply:REPLY) ASSERT reply.ids IS UNIQUE
3. My cypher requests and JSON requests
3.1. If a single entry requires a single node creation, the job description is as follows
{"method" : "POST","to" : "/cypher","body" : {"query" : "MERGE (child:Post {postId:1001, userId:901})"}}
3.2. If two nodes with one connection are required for one record, the operation description is as follows
{"method" : "POST","to" : "/cypher","body" : {"query" : "MERGE (parent:Post {postId:1002, userId:902}) MERGE (child:Post {postId:1003, userId:903}) CREATE UNIQUE parent-[relationship:REPOST {ids:'1002_1003', created_at:'Wed Nov 06 14:06:56 AST 2013' }]->child"}}
3.3. I usually send 100 job descriptions (mixed 3.1 and 3.2) for each batch, which takes about 150-250 ms, to do this.
4. Performance issues
4.1. Concurrency:
/db/data/batch (target:/cypher), -, , , Neo4j (-) ().
4.2. MERGE .
( 3.2), - ; - CypherExecutionException , node xxxx aaaa "bbbbb" = [ccccc]; , MERGE , node, .
, .
GitHub , https://github.com/neo4j/neo4j/issues/1428
4.3. CREATE UNIQUE .
github.
4.4. :
, cypher, get_or_create (/db/data/index/ node/Post? uniqueness = get_or_create /db/data/index/relationship/XXXXX? = get_or_create)
- ( ), ( node created )
, auto_indexing , 2.0.0, , + cypher + .
HOWEVER, + cypher, 200 , , MERGE , , 600 ~ 800/, 5 /.
- , .