5-horse riding pig

I want to return the top 5 lines of the group. Basically, I have a table with state names and their cities, which are grouped by state name. I want to have the 5 best cities of this state, and not all of them.

How can I do this with a pig? Thank you in advance.

+4
source share
1 answer

After GROUP BY inside FOREACH ... first you can do ORDER BY and then LIMIT . This will sort things in each group first by city size, then pull the top 5.

 B = GROUP A BY state; C = FOREACH B { DA = ORDER A BY citysize DESC; DB = LIMIT DA 5; GENERATE FLATTEN(group), FLATTEN(DB.citysize), FLATTEN(DB.cityname); } 
+11
source

All Articles