Is gzip data compression / decompression transparent in Hadoop / PIG?

I read somewhere that Hadoop has built-in support for compression and decompression, but I think that this is about the output of the cartographer (by setting some properties)?

I wonder if there are any PIG load / store functions that I can use to read compressed data or output data in compressed form?

+1
source share
1 answer

PigStorage handles compressed input by examining file names:

  • * .bz2 / *. bz - org.apache.pig.bzip2r.Bzip2TextInputFormat
  • Everything else uses org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat - This extends oahmapreduce.TextinputFormat , which can handle .gz and zippy files if you have codecs installed.

The output is processed through some properties:

  • output.compression.enabled - true / false
  • output.compression.codec - class name of the codec used ( org.apache.hadoop.io.compress.GzipCodec for gzip)

If you feel that you can get through PigStorage.java, you might be interested.

+6
source

All Articles