Pig: apply the FOREACH statement to each item in the bag

Example: I have a class relationship, with a student bag in it:

class: {teacher_name: chararray,students: {(firstname: chararray, lastname: chararray)} 

I want to perform an operation for each student, leaving the global structure untouched, i.e. receive:

 class: {teacher_name: chararray,students: {(fullname: chararray)} 

where for each student, fullname = CONCAT (first name, last name)

My understanding is that a nested FOREACH will not be my solution here, as it still only generates 1 entry per input tuple, while I want something to be applied in every element of the bag.

Pretty easy to do with UDF, but wondered if this is possible in pure Piglatin.

+6
source share
1 answer

In PIG 0.10, this is possible without UDF, since FOREACH can be nested in FOREACH. Here is an example:

 inpt = load '~/pig/data/bag_concat.dat' as (k : chararray, c1 : chararray, c2 : chararray); dump inpt; 1 qw 1 sd 2 qa 2 ty 2 ui 2 op bags = group inpt by k; describe bags; bags: {group: chararray,inpt: {(k: chararray,c1: chararray,c2: chararray)}} result = foreach bags { concat = foreach inpt generate CONCAT(c1, c2); --it will iterate only over the records of the inpt bag generate group, concat; }; dump result; (1,{(qw),(sd)}) (2,{(qa),(ty),(ui),(op)}) 
+19
source

Source: https://habr.com/ru/post/923643/


All Articles