I am working with an application that uses a Neo4J graph containing about 10 million nodes. One of the main tasks that I perform daily is the batch import of new / updated nodes into a schedule of the order of about 1-2 million. After experimenting with Python scripts in combination with the Cypher query language, I decided to try the built-in graph with the Java API to get the best results.
What I found is a 5x improvement using my own Java API. I am using Neo4j 2.1.4, which I believe is the last. I read in other posts that the built-in chart is a little faster, but that this should / may change in the near future. I would like to confirm my findings to those who have observed similar results?
I have included the snippets below to give a general idea of the methods used - the code has been greatly simplified.
cypher / python sample:
cnode = self.graph_db.create(node(hash = obj.hash,
name = obj.title,
date_created = str(datetime.datetime.now()),
date_updated = str(datetime.datetime.now())
))
Sample from embedded graph using java:
final Node n = Graph.graphDb.createNode();
for (final Label label : labels){
n.addLabel(label);
}
for (Map.Entry<String, Object> entry : properties.entrySet()) {
n.setProperty(entry.getKey(), entry.getValue());
}
Thank you for understanding!
source
share