Neo4j Output Format

After working with neo4j and now, moving on to review, to make my own object manager (object manager) work with the extracted data in the application, I wonder about the output format of neo4j.

When I run the query, it always returns as tabular data. Why is this?? Of course, tables hold a large place in data and processing, but it seems strange that a graph database can only be displayed in this format.

Now, when I want to create an object graph in my application, I would need to remove all the objects, and this is not very good for performance and does not use a true graph.

Consider MATCH (A)-->(B) RETURN A, B , when there is one A and three B, it will return:

 AB 1 1 1 2 1 3 

The same A is transmitted 3 times over the connection to the database, and I need it only once, and I know this before the data is retrieved.


Something like this seems great. Http://nigelsmall.com/geoff load2neo is good, loading-with-neo will be nice too! either in the geoff format or in any other formats https://gephi.org/users/supported-graph-formats/

Each language can then implement its own functions for directly creating objects.

To clarify:

  • Relations between nodes are lost in tabular data
  • Reserve (non-optimal) format for charts
  • Edges (relationships) and vertices (nodes) are usually not in the same table. (makes queries more complicated?)

Another consideration (which his own post may deserve) is that a good way to model relationships in a graph of objects? How are the objects? or how is the data / method inside node objects?


@Kikohs
Question: What do you mean by "Each language can then implement its own functions for directly creating objects."

A: Using the (partial) graph provided by the database (as a result of the query), PHP can provide a factory method (preferably in C) for constructing an object graph (usually this is an expensive operation). But only if the graph of the object is well defined in the standard format (because this function should be simple and universal).

Q: Do you want to export the full schedule or just the query result?
A: The result of the request. However, a query such as MATCH (n) OPTIONAL MATCH (n)-[r]-() RETURN n, r should return the full graph.

Q: do you want to flush a subgraph created from the query result to disk?
A: No, existing interfaces, such as REST, prefer to receive the result of the request.

Q: do you want to create a subgraph that comes from a query in memory and then query it in another language?
A: no. I want the query result in a different format to be then tabular (examples were provided)

Q: You are making a query that returns the name node, in which case would you like to get the full node bound or just the name? The same goes for the edges.
A: Nodes have no names. They have properties, labels, and relationships. I would like to get enough information to get A) the identifier of the node, it denotes its properties and B) the relation to other nodes that are in the same result.

Note that the first part of the question is not a specific β€œhow” question, but rather β€œwhy is this impossible?”. (or, if so, I like being wrong on that). The second is a real "practical" question, namely, "how to model relationships." Both questions have in common that they both try to find the answer to the question "how to efficiently get graph data in PHP."


@Michael Hunger

You have a point when you say that not all result data can be expressed as a graph of objects. It is reasonable to say that an alternative format for outputting to a table will only supplement the format of the table and not replace it.

As far as I understand from your answer, the natural (raw) output format from the database is the result format with duplicates in it ("data streams are deleted as they arrive"). In this case, I understand that now it goes to an alternative program (dev stack) to perform the mapping. So my conclusion is about neo4j implementing something like this:
Pro - no need to do this in every implementation language (application)
Con - 1) it is impossible to map specific applications, 2) the lack of performance if the implementation language is fast

"Even if you use geoff, graphml, or the gephi format, you must keep all the data in memory to deduplicate the results."
I don’t quite understand this point, you say that these formats cannot hold deduplicated results (in some cases)? So infact that there is no possible text format with which the graph can be described without duplication?

"There are also questions about what you want to include in your release?"
I was on the assumption that the cypher language is powerful enough to indicate this in the request. And so the output format will have what the database can provide as a result.

"You can simply return the paths you receive that are unique paths through the graph itself."
Useful suggestion, I will play with this idea :)

"The neo4j shell dump team takes an approach whereby the results of cypher are output to the memory structure, enriching it."
Does the enrichment process receive additional data from the database or data already contained in the original result?

+7
object database graph neo4j
source share
2 answers

There are more.

First of all, as you said, tabular results from queries are really ordinary and necessary for integration with other systems and databases.

Secondly, you do not actually return the raw data of the graph from your queries, but aggregate, project, cut, extract information from your graph. Thus, the relationship with the original data of the graph is already lost in most query results, which, as I see it, are used.

The only time people need / use raw chart data is when to export data subgraphs from the database as a query result.

The problem of doing this as a deduplicated graph is that db must first retrieve all the result data in memory for deduplication, extract the necessary relationships, etc.

Usually it simply transfers the data as needed and uses a small memory.

Even if you use geoff, graphml, or the gephi format, you must store all the data in memory to deduplicate the results (which are returned as paths with potential duplicate nodes and relationships).

There are also questions about what you want to include in your output? Only nodes and rels returned? Or, in addition, all the other relationships between the nodes that you return? Or all the rels of the returned nodes (but then you should also include the final nodes of this relationship).

You can simply return the paths you found, which are unique paths through the graph in themselves:

 MATCH p = (n)-[r]-(m) WHERE ... RETURN p 

Another way to solve this problem in Neo4j is to use reasonable aggregates.

eg. what you can do is use collection to aggregate data per node (i.e. subgraph view)

 MATCH (n)-[r]-(m) WHERE ... RETURN n, collect([r,type(r),m]) 

or use the new letter card syntax (Neo4j 2.0)

 MATCH (n)-[r]-(m) WHERE ... RETURN {node: n, neighbours: collect({ rel: r, type: type(r), node: m})} 

The neo4j shell dump command takes an approach to pull the cypher results into a structure inside memory, enriching it, and then outputting it as cypher create statement (s).

A similar approach can be used for other output formats, if you need it. But so far there has been no need.

If you really need this functionality, it makes sense to write a server extension that uses cypher to specify the request but does not allow statements to be returned. Instead, you will always use RETURN * to aggregate data into a structure in memory (SubGraph in org.neo4j.cypher packages). And then render it as a suitable format (for example, JSON or one of the above).

These can be starting points for this:

There are other efforts, such as GraphJSON from GraphAlchemist: https://github.com/GraphAlchemist/GraphJSON

And the d3 json format is also very useful. We use it in the neo4j console (console.neo4j.org) to return graph visualization data that is then consumed directly by d3.

+12
source share

I have been working with neo4j for some time now, and I can tell you that if you are worried about memory and performance, you should completely abandon cypher and instead use indexes and other graph tracing methods (for example, get all relations of a certain type from or to the beginning node, and then iterating over the found nodes).

As stated in the documentation, Cypher is not intended to be used in an application, but rather as an administration tool. In addition, in production environments, it is very easy to crash the server by running the wrong request.

Secondly, in the documents of the API method there is no mention of extracting the output in the form of a graphite structure. You will need to process the output of the request and build it.

However, in the example you give, you say that there is only one A and you know this before the data is extracted, so you do not need to do:

 MATCH (A)-->(B) RETURN A, B 

but just

 MATCH (A)-->(B) RETURN B 

(you don’t need to get A three times because you already know that these are nodes associated with A)

or better (if you need relationship information) something like

 MATCH (A)-[r]->(B) RETURN r 
-2
source share

All Articles