Single Column Drop in Pig

I filter the table by a list of 20 identifiers. My code now looks like this:

A = LOAD 'ids.txt' USING PigStorage(); B = LOAD 'massive_table' USING PigStorage(); C = JOIN A BY $0, B BY $0; D = FOREACH C GENERATE $1, $2, $3, $4, ... STORE D INTO 'foo' USING PigStorage(); 

What I don't like is row D, where I need to restore a new table to get rid of the join column by explicitly declaring every single column that I want to be present (and sometimes many columns). I am wondering if there is something equivalent:

 FILTER B BY $0 IN (A) 

or

 DROP $0 FROM C 
+4
source share
2 answers

Perhaps similar to this question:

This refers to a JIRA ticket: https://issues.apache.org/jira/browse/PIG-1693 , which are examples of how you can use the notation .. to denote all other fields:

 D = FOREACH C GENERATE $1 .. ; 

It is assumed that you have 0.9.0+ PIG

+9
source

Disable column by number

If you want to remove column number 5, you can do it like this:

 D = FOREACH C GENERATE .. $4, $6 .. ; 

Disable column by name

If you want to delete a column by name, this is not possible, only knowing the name of the column that you want to delete. However, this is possible if you know the column names immediately before and after this column. If you want to remove the column (s) between colBeforeMyCol and colAfterMyCol, you can do it like this:

 aliasAfter = FOREACH aliasBefore GENERATE .. colBeforeMyCol, colAfterMyCol ..; 
+2
source

Source: https://habr.com/ru/post/1415391/


All Articles