Why is parquet slower for me against the hive text file format?

Question

Why is parquet slower for me against the hive text file format?

OK! So I decided to use Parquet as the storage format for hive tables, and before I implemented it in my cluster, I decided to run some tests. Surprisingly, Parquet was slower in my tests against the general idea that it was faster than text files.

Please note that I am using Hive-0.13 on MapR

Performs my workflow

Table a

Format - Text Format

Table Size - 2.5 GB

Table B

Format - Parquet

Table Size - 1.9 GB

[Create table B saved as parquet, how to select * from A]

Table C

Format - Parquet with instant compression

Table Size - 1.9 GB

[Create table C stored as tblproperties parquet ("parquet.compression" = "SNAPPY"), how to select * from A]

.

A

- 15

- 1

- 123.33

- 59.057

B

- 8

- 1

- 204,92

- 50.33

A

- 15

- 0

- 51,18

- 25.296

B

- 8

- 0

- 117,08

- 27,448

Where [1000 ]

A

- 15

- 0

- 57,55

- 20.254

B

- 8

- 0

- 113,97

- 27.678

[ 4 ] Where [1000 ]

A

- 15

- 0

- 57,55

- 20.254

B

- 8

- 0

- 113,97

- 27.678

[ sum ]

A

- 15

- 1

- 127,85

- 29,68

B

- 8

- 1

- 255,2

- 41,025

, , , Parquet , , .

C , , TextFile, .

-, , , ?

!

ORC . .

- 123,33 .

- 204,92

ORC - 119.99

ORC CPU SNAPPY - 107,05

- 127,85

- 255,2

ORC - 120,48

ORC SNAPPY Cumulative CPU - 98.27

- 128,79

- 211,73

ORC - 165,5

ORC CPU SNAPPY - 135,45

4 where

- 72,48

- 136,4

ORC - 96,63

ORC CPU SNAPPY - 82,05

, ORC , ? -, , ?

!

+4

hadoop hive parquet snappy mapr

Rahul 02 . '15 10:25

:

4

:

68

1

1

1

1

0

0

0

Why is parquet slower for me against the hive text file format?

More articles: