Neo4j: Replace multiple node s with the same property with one node

Say I have a node name property in neo4j. Now I want to ensure that there is at most one node for a given name, identifying all nodes with the same name. More precisely: if there are three nodes where the name is “dog”, I want them to be replaced with only one node with the name “dog”, which:

  • Collects all properties from all the source three nodes.
  • There are all arcs that were attached to the source three nodes.

The background for this is as follows: on my graph there are often several nodes with the same name, which should be considered as "equal" (although some have richer information about properties than others). Putting a.name = b.name in the WHERE clause is very slow.

EDIT: I forgot to mention that my Neo4j has version 2.3.7 at the moment (I cannot update it).

SECOND EDIT: There is a list of shortcuts for nodes and possible arcs. Known type of nodes.

THIRD EDIT: I want to call the “node collapse” procedure from Java above, so it would also be helpful to solve Cypher issues and procedural code.

+8
java graph neo4j cypher
source share
2 answers

I made a test file with the following circuit:

 CREATE (n1:TestX {name:'A', val1:1}) CREATE (n2:TestX {name:'B', val2:2}) CREATE (n3:TestX {name:'B', val3:3}) CREATE (n4:TestX {name:'B', val4:4}) CREATE (n5:TestX {name:'C', val5:5}) MATCH (n6:TestX {name:'A', val1:1}) MATCH (m7:TestX {name:'B', val2:2}) CREATE (n6)-[:TEST]->(m7) MATCH (n8:TestX {name:'C', val5:5}) MATCH (m10:TestX {name:'B', val3:3}) CREATE (n8)<-[:TEST]-(m10) 

Which leads to the following result:

enter image description here

Where the nodes of B are really the same nodes. And here is my solution:

 //copy all properties MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, m SET n += m; //copy all outgoing relations MATCH (n:TestX), (m:TestX)-[r:TEST]->(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes FOREACH (x in endnodes | CREATE (n)-[:TEST]->(x)); //copy all incoming relations MATCH (n:TestX), (m:TestX)<-[r:TEST]-(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes FOREACH (x in endnodes | CREATE (n)<-[:TEST]-(x)); //delete duplicates MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) detach delete m; 

The result obtained is as follows:

enter image description here

It should be noted that you need to know the type of various relationships.

All properties are copied from nodes with "higher" identifiers to nodes with "lower" identifiers.

+4
source share

I think you need something like a synonym for nodes.

1) Go through all the nodes and create a synonym for node:

 MATCH (N) WITH N MERGE (S:Synonym {name: N.name}) MERGE (S)<-[:hasSynonym]-(N) RETURN count(S); 

2) Remove synonyms with only one node:

 MATCH (S:Synonym) WITH S MATCH (S)<-[:hasSynonym]-(N) WITH S, count(N) as count WITH S WHERE count = 1 DETACH DELETE S; 

3) Transport properties and relationships for other synonyms (with apoc ):

 MATCH (S:Synonym) WITH S MATCH (S)<-[:hasSynonym]-(N) WITH [S] + collect(N) as nodesForMerge CALL apoc.refactor.mergeNodes( nodesForMerge ); 

4) Uncheck Synonym :

 MATCH (S:Synonym)<-[:hasSynonym]-(N) CALL apoc.create.removeLabels( [S], ['Synonym'] ); 
+4
source share

All Articles