PigStorage handles compressed input by examining file names:
- * .bz2 / *. bz -
org.apache.pig.bzip2r.Bzip2TextInputFormat - Everything else uses
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat - This extends oahmapreduce.TextinputFormat , which can handle .gz and zippy files if you have codecs installed.
The output is processed through some properties:
output.compression.enabled - true / falseoutput.compression.codec - class name of the codec used ( org.apache.hadoop.io.compress.GzipCodec for gzip)
If you feel that you can get through PigStorage.java, you might be interested.
Chris white
source share