Not sure how you will pull the percentage, but if you know that your table size is 100 rows, you can use the LIMIT command to get the best 10%, for example:
A = load 'myfile' as (t, u, v); B = order A by t; C = limit B 10;
(The above is an example from http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+LIMIT+Operator )
As for the dynamic limit of up to 10%, I’m not sure that you can do this without knowing how the “big” table is, and I’m sure that you could not do this in UDF, you will need to run the task to count the number of rows, then another job to execute the LIMIT query.
source share