Return Top N rows for each group (Vertica / vsql)

A familiar question, but with Vertica. I would like to return the top 5 lines of geo_country based on sum (imps) for each tag_id. This is the query that I started:

SELECT tag_id, geo_country, SUM(imps) AS imps, RANK() OVER (PARTITION BY tag_id ORDER BY SUM(imps) DESC) AS rank FROM table1 WHERE tag_id IN (2013150,1981153) AND ymd > CURRENT_DATE - 3 GROUP BY 1, 2 LIMIT 10; 

This actually returns only the rows from the first tag in the WHERE clause (2013150). I know that another tag has imps values โ€‹โ€‹that are high enough to include it in the results.

Also, how do I implement part of Top N? I tried adding a LIMIT clause to the OVER function, but it does not look like it is a recognized parameter.

+6
source share
2 answers

solved. The solution is to convert the query into a subquery, and then use the WHERE clause to filter by rank:

 SELECT * FROM (SELECT tag_id, geo_country, sum(imps), RANK() OVER (PARTITION BY tag_id ORDER BY SUM(imps) DESC) AS rank FROM table1 WHERE tag_id IN (2013150,1981153) AND ymd > CURRENT_DATE - 3 GROUP BY 1,2) as t2 WHERE t2.rank <=5; 
+9
source

I think what happens here is that the group orders your data on tag_id and then on geo_country. Fulfilling the limit, you take the first 10 entries. If for tag_id 1 there are at least 10 geo_countries, you will see only the result of tag_id 1 in your result. Not sorting at ASC rank will solve your problem.

I'm not sure that using rank in sorting is allowed, although in Vertica.

 SELECT tag_id, geo_country, SUM(imps) AS imps, RANK() OVER (PARTITION BY tag_id ORDER BY SUM(imps) DESC) AS rank FROM table1 WHERE tag_id IN (2013150,1981153) AND ymd > CURRENT_DATE - 3 GROUP BY 1, 2 ORDER BY 4 LIMIT 10; 
0
source

All Articles