I am trying to analyze data-delimited files created by our services using Amazon Elastic Map Reduce through the Pig program. Everything is going well, except that all of our data files contain a header line that defines the purpose of each column. Obviously, the headers (string) cannot be entered in the numerical values ββof the data, so I get warnings from Pig, as shown below:
2011-03-17 22:49:55,378 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.PigStorage: Unable to interpret value [<snip>] in field being converted to double, caught NumberFormatException <For input string: "headerName"> field discarded
I have a filter after the download statement that tries to guarantee that I will not work in any header lines (by filtering the header terms), but I would like to get rid of the warning noise in order to avoid masking any potential problems (e.g. actual fields data that is not displayed properly).
Is it possible?
source
share