Group value of the card in pig farming

I am new to pigscript. Say we have a file

[a#1,b#2,c#3] [a#4,b#5,c#6] [a#7,b#8,c#9] 

pigs script

 A = LOAD 'txt' AS (in: map[]); B = FOREACH A GENERATE in#'a'; DUMP B; 

We know that we can accept the values ​​entered into the key. In the above example, I took a map containing values ​​relative to the key "a" . Assuming I don’t know the key, I want to group the values ​​relative to the keys in relation and unload it.

 (a,{1,4,7}) (b,{2,5,8}) (c,{3,6,9}) 

Does the operation help pigs or do I need to go with UDF? Please help me with this. Thanks.

+6
source share
1 answer

You can create a custom UDF that converts the card into a bag (using Pig v0.10.0):

 package com.example; import java.io.IOException; import java.util.Map; import java.util.Map.Entry; import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; public class MapToBag extends EvalFunc<DataBag> { private static final BagFactory bagFactory = BagFactory.getInstance(); private static final TupleFactory tupleFactory = TupleFactory.getInstance(); @Override public DataBag exec(Tuple input) throws IOException { try { @SuppressWarnings("unchecked") Map<String, Object> map = (Map<String, Object>) input.get(0); DataBag result = null; if (map != null) { result = bagFactory.newDefaultBag(); for (Entry<String, Object> entry : map.entrySet()) { Tuple tuple = tupleFactory.newTuple(2); tuple.set(0, entry.getKey()); tuple.set(1, entry.getValue()); result.add(tuple); } } return result; } catch (Exception e) { throw new RuntimeException("MapToBag error", e); } } } 

Then:

 B = foreach A generate flatten(com.example.MapToBag(in)) as (k:chararray, v:chararray); describe B; B: {k: chararray,v: chararray} 

Now group by key and use the nested foreach:

 C = foreach (group B by k) { value = foreach B generate v; generate group as key, value; }; dump C; (a,{(1),(4),(7)}) (b,{(2),(5),(8)}) (c,{(3),(6),(9)}) 
+4
source

Source: https://habr.com/ru/post/925661/


All Articles