I have a table with a hive like
CREATE TABLE beacons ( foo string, bar string, foonotbar string ) COMMENT "Digest of daily beacons, by day" PARTITIONED BY ( day string COMMENt "In YYYY-MM-DD format" );
To fill in, I am doing something like:
SET hive.exec.compress.output=True; SET io.seqfile.compression.type=BLOCK; INSERT OVERWRITE TABLE beacons PARTITION ( day = "2011-01-26" ) SELECT someFunc(query, "foo") as foo, someFunc(query, "bar") as bar, otherFunc(query, "foo||bar") as foonotbar ) FROM raw_logs WHERE day = "2011-01-26";
This creates a new partition with individual products compressed using deflate, but the ideal way would be to go through the LZO compression codec.
Unfortunately, I'm not quite sure how to do this, but I assume this is one of many runtime parameters, or perhaps just an extra line in the CREATE TABLE DDL.
compression hadoop configuration hive
David
source share