I have a set of record sets that I load from a file, and the first thing I need to do is get the max and min column. In SQL, I would do this with a subquery like this:
select c.state, c.population, (select max(c.population) from state_info c) as max_pop, (select min(c.population) from state_info c) as min_pop from state_info c
I suppose there should be an easy way in PIG to do this, but it's hard for me to find it. It has a MAX and MIN function, but when I tried to do the following, it did not work:
records=LOAD '/Users/Winter/School/st_incm.txt' AS (state:chararray, population:int); with_max = FOREACH records GENERATE state, population, MAX(population);
This did not work. I was fortunate to add an extra column with the same value for each row, and then group them in that column. Then get max in this new group. This seems like a confusing way to get what I want, so I thought I would ask if anyone knew an easier way.
Thanks in advance for your help.
hadoop apache-pig
Winter
source share