Using PIG to upload a file

I am very new to PIG and I have what seems like a very simple problem. I have a line of code that states:

A = load 'Sites/trial_clustering/shortdocs/*' AS (word1:chararray, word2:chararray, word3:chararray, word4:chararray); 

where each file is basically a line of 4 comma separated words. However, PIG does not break it down into 4 words. When I reset A , I get: (Money, coins, loans, debt,,,) I tried searching on Google and I can’t find what format my file should have in order for PIG to interpret it correctly. Please, help!

+7
source share
1 answer

Your problem is that Pig by default loads files marked with a tab , not commas. What happens "Money, coins, loans, debt" gets stuck in your first column, word1 . When you print it, you get the illusion that you have several columns, but in fact the first is filled with your entire line, and the rest are filled with zero.

To fix this, you must tell PigStorage to load by comma by doing:

 A = LOAD '...' USING PigStorage(',') AS (...); 
+23
source

All Articles