I have a pig script in which I load a dataset by immersing it in two separate datasets, and then doing some calculations and finally adding another calculated field to it. Now I want to combine these two datasets.
A = LOAD '/user/hdfs/file1' AS (a:int, b:int); A1 = FILTER A BY a > 100; A2 = FILTER A BY a <= 100 AND b > 100; -- Now I do some calculation on A1 and A2
So, essentially, after calculating, here is a diagram for both:
{A1 : {a:int, b:int, type:chararray}} {A2: {a:int, b:int, type:chararray}}
Now, before I bring it back to HDFS, I want to merge the two datasets back. Something like UNION ALL in SQL. How can i do this?
source share