Retrieving data from Neo4j using PySpark

I have time series that are currently stored as a graph (using a time tree structure similar to this ) in Neo4j server, version 2.3.6 (so there is only a REST interface, without Bolt). What I'm trying to do is do some analytics of these time series in a distributed way using PySpark .

Now I know about existing projects for connecting Spark to Neo4j, especially to those listed here . The problem is that they are focused on creating an interface for working with charts. In my case, the graphs are not relevant, since my Neo4j Cypher requests are designed to create arrays of values. Everything that happens in the downstream is related to the processing of these arrays as time series; again, and not as a schedule.

My question is: did someone successfully request a REST-only Neo4j instance in parallel with PySpark, and if so, how did you do it? The py2neo library seemed like a good candidate until I realized that the connection object cannot be shared between partitions (or, if possible, I don't know how to do this). Right now, I am considering that my Spark jobs run independent REST requests on the Neo4j server, but I wanted to see how the community could solve this problem.

Best, AurΓ©lien

+7
python neo4j pyspark
source share

No one has answered this question yet.

See related questions:

6
Bulbflow: difference between neo4jserver Graph and neo4jserver Neo4jclient
3
Neo4j Bolt does not constantly process Cypher requests
2
neo4j: Using graph algorithm in cypher
2
Neo4j Serializing Data as JSON
one
Java configuration - Bolt for Neo4j 3.0.0-M01 Embedded
one
Get connected graphs in neo4j using cypher request
one
Hosting Neo4j in GCP Compute Engine
one
Loading Neo4j using a REST interface lock database?
0
Unable to connect to neo4j via browser, bolt port changed in neo4j browser
0
PROBLEM OF CONNECTING py2neo v4 to my neo4j server

All Articles