the manual / documentation makes extensive use of the language of “inner bag” and “outer bag” (say: http://pig.apache.org/docs/r0.11.1/basic.html ), and yet I was not able to clearly determine the exact a definition that separates terms.
eg. all are initially interconnected:
- If I give you a “foo” bag, what do you need to know to designate foo as “inner bag” and “outer bag”?
- Is it a “bag” that is not the outermost bag, but an “inner bag”?
- Are labels internal and external always exclusive?
- In PigLatin, is everything a “bag” relationship - or is it just the “outermost bag”? (and inner bags are not relationships).
to create the discussed example:
grunt> dump A; (1,2,3) (4,2,1) (8,3,4) (4,3,3) grunt> W1 = GROUP A ALL; grunt> W2 = GROUP W1 ALL; grunt> W3 = GROUP W2 ALL; grunt> W4 = GROUP W3 ALL; grunt> describe W4; W4: {group: chararray,W3: {(group: chararray,W2: {(group: chararray,W1: {(group: chararray,A: {(f1: int,f2: int,f3: int)})})})}} grunt> illustrate W4; (1,2,3) --------------------------------------------------- | A | f1:int | f2:int | f3:int | --------------------------------------------------- | | 1 | 2 | 3 | | | 8 | 3 | 4 | --------------------------------------------------- ------------------------------------------------------------------------------------------------ | W1 | group:chararray | A:bag{:tuple(f1:int,f2:int,f3:int)} | ------------------------------------------------------------------------------------------------ | | all | {(1, 2, 3), (8, 3, 4)} | ------------------------------------------------------------------------------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------- | W2 | group:chararray | W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})} | ----------------------------------------------------------------------------------------------------------------------------------------------- | | all | {(all, {(1, 2, 3), (8, 3, 4)})} | ----------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | W3 | group:chararray | W2:bag{:tuple(group:chararray,W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})})} | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | all | {(all, {(all, {(1, 2, 3), (8, 3, 4)})})} | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | W4 | group:chararray | W3:bag{:tuple(group:chararray,W2:bag{:tuple(group:chararray,W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})})})} | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | all | {(all, {(all, {(all, {(1, 2, 3), (8, 3, 4)})})})} | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- grunt> dump W4; (all,{(all,{(all,{(all,{(1,2,3),(4,2,1),(8,3,4),(4,3,3)})})})})
among the bags - W1, W2, W3, W4 - internal, external?