StrSplit Functions in Pig

Can someone explain to me how to get this result below in Pigscript

my input file is below

a.txt

aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data

I am writing a pig script as follows

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1)),a2,a3;

I do not know how to do that. I need to do as shown below. Basically I need all the characters after the dot character in the first atom.

(kyl,data,data)
(kkk,data,data)
(hj,data,data)
(dff,data,data)

Can someone give me a code for this

+4
source share
3 answers

Here is what you need to do -

Here's the problem of shielding in pigs parsing procedures when it encounters a dot because it is considered an operator, referring to this link for more information from Dot Operator .

escape- unicode :\u002E. - .

, -

A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'\\u002E')) as (a1:chararray, a1of1:chararray),a2,a3;
C = FOREACH B GENERATE a1of1,a2,a3;

, .

+7

STRSPLIT(),

A = LOAD 'C:\\Users\\Ren\\Desktop\\file' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray); 

B = foreach A generate SUBSTRING(a1,INDEXOF(a1,'.',0)+1,(int)SIZE(a1)),a2,a3;                                                                                 
+2
A = LOAD 'a.txt' USING PigStorage(',') AS(a1:chararray,a2:chararray,a3:chararray);

B = FOREACH A GENERATE FLATTEN(STRSPLIT(a1,'.')),a2,a3;

This will divide a1 into 2 parts, which are before the point and after the point, from this you can choose after the point operator.

C = foreach B generate $1,$2,$3;

where $ 1 after the dot operator

0
source

All Articles