Getting the first n entries for each group in neo4j

I need to group the data from the neo4j database and then filter out all but the top n entries of each group.

Example:

I have two types of node: Order and Article. There is an β€œADDED” relationship between them. The "ADDED" relationship has a timestamp property. What I want to know (for each article) is how many times it was among the first two articles added to the order. I tried the following approach:

  • get everything [Order] [ADDED] -Article

  • sort the result from step 1 by order ID as the first sort key, and then by the time stamp of the ADDED relationship as the second sort key;

  • for each subgroup from step 2 representing one order, keep only the top 2 lines;

  • Count the various article IDs in the output of step 3;

My problem is that I got stuck in step 3. Is it possible to get the top two rows for each subgroup representing order?

Thanks,

Tiberius

+7
neo4j
source share
3 answers

Try

 MATCH (o:Order)-[r:ADDED]->(a:Article) WITH o, r, a ORDER BY o.oid, rt WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a RETURN a.aid AS articleId, COUNT(*) AS count 

The results look like

 articleId count 8 6 2 2 4 5 7 2 3 3 6 5 0 7 

on this sample graph created with

 FOREACH(opar IN RANGE(1,15) | MERGE (o:Order {oid:opar}) FOREACH(apar IN RANGE(1,5) | MERGE (a:Article {aid:TOINT(RAND()*10)}) CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a ) ) 
+7
source share

Use LIMIT in combination with ORDER BY to get the top N of everything. For example, the top 5 points:

 MATCH (node:MyScoreNode) RETURN node ORDER BY node.score DESC LIMIT 5; 

The ORDER BY ensures that the highest scores are displayed first. LIMIT gives you only the first 5, which, since they are sorted, are always the highest.

+2
source share

I tried to achieve the desired results and could not.

So, my guess is this is not possible with a clean cipher.

What is the problem? Cipher sees everything as a way. And actually doing traverse.
Trying to group the results and then filter in each group means that cypher must somehow fork it at some points. But Cypher did a filter on all the results, because they are considered as a set of different paths.

My suggestion is to create some queries that achieve the desired functionality and implement some client logic.

0
source share

All Articles