Optimizing high volume insertion in Neo4j using REST

I need to insert a huge number of nodes with the relationships between them in Neo4j through the Ratch API Batch endpoint, about 5 thousand records / s (still increasing).

This will be a 24x7 continuous insert. Each entry may need to create only one node, but for another it may be necessary to create two nodes and create one link.

Can I improve insert performance by changing my procedure or changing my Neo4j settings?

My progress:

1. I tested Neo4j for a while, but I couldn’t get the required performance

Test server server: 24 cores + 32 GB RAM

Neo4j 2.0.0-M06 is installed as a standalone service.

Running my Java application on the same server (in the future, Neo4j and Java applications will have to run on their own server, so the built-in mode cannot be used)

REST API endpoint: / db / data / batch (target: / cypher)

Using Schema Index, Constraints, MERGE, CREATE UNIQUE.

2. My scheme:

neo4j-sh (0)$ schema
==> Indexes
==>   ON :REPLY(created_at)   ONLINE                             
==>   ON :REPLY(ids)          ONLINE (for uniqueness constraint) 
==>   ON :REPOST(created_at) ONLINE                             
==>   ON :REPOST(ids)        ONLINE (for uniqueness constraint) 
==>   ON :Post(userId)      ONLINE                             
==>   ON :Post(postId)    ONLINE (for uniqueness constraint) 
==> 
==> Constraints
==>   ON (post:Post) ASSERT post.postId IS UNIQUE
==>   ON (repost:REPOST) ASSERT repost.ids IS UNIQUE
==>   ON (reply:REPLY) ASSERT reply.ids IS UNIQUE

3. My cypher requests and JSON requests

3.1. If a single entry requires a single node creation, the job description is as follows

{"method" : "POST","to" : "/cypher","body" : {"query" : "MERGE (child:Post {postId:1001, userId:901})"}}

3.2. If two nodes with one connection are required for one record, the operation description is as follows

{"method" : "POST","to" : "/cypher","body" : {"query" : "MERGE (parent:Post {postId:1002, userId:902}) MERGE (child:Post {postId:1003, userId:903}) CREATE UNIQUE parent-[relationship:REPOST {ids:'1002_1003', created_at:'Wed Nov 06 14:06:56 AST 2013' }]->child"}}

3.3. I usually send 100 job descriptions (mixed 3.1 and 3.2) for each batch, which takes about 150-250 ms, to do this.

4. Performance issues

4.1. Concurrency:

/db/data/batch (target:/cypher), -, , , Neo4j (-) ().

4.2. MERGE .

( 3.2), - ; - CypherExecutionException , node xxxx aaaa "bbbbb" = [ccccc]; , MERGE , node, .

, .

GitHub , https://github.com/neo4j/neo4j/issues/1428

4.3. CREATE UNIQUE .

github.

4.4. :

, cypher, get_or_create (/db/data/index/ node/Post? uniqueness = get_or_create /db/data/index/relationship/XXXXX? = get_or_create)

- ( ), ( node created )

, auto_indexing , 2.0.0, , + cypher + .

HOWEVER, + cypher, 200 , , MERGE , , 600 ~ 800/, 5 /. - , .

+4
1

2.0 , . 100 1000 HTTP- 30k-50k ( ).

. , :

http://docs.neo4j.org/chunked/milestone/rest-api-transactional.html

, , , , API 10 , . :

http://docs.neo4j.org/chunked/milestone/server-unmanaged-extensions.html

Cypher. concurrency ( , ) node, , tx.acquireWriteLock() a node (REMOVE n.__lock__).

( cypher), . , (POSTing CSV , cypher ).

https://github.com/jexp/cypher-rs

+5

All Articles