The bush is grouped into more than one column.

Question

The bush is grouped into more than one column.

I understand that when the hive table is clustered by one column, then it performs the hash function of this column in the column, and then puts this row of data in one of the buckets. And there is a file for each bucket, i.e. If there are 32 buckets, then there are 32 files in hdfs.

What does it mean to have clustering on more than one column? For example, let's say that the table has CLUSTERED BY (continent, country) INTO 32 BUCKETS.

How would a hash function be performed if there is more than one column?

How many files will be created? Is it 32 more?

+4

hadoop hive

learninghuman Jun 16 '15 at 15:02

source share

2 answers

, hash_function (bucketing_column) mod num_buckets. ( "0x7FFFFFFF", ). hash_function bucketing. int , hash_int (i) == i. , user_id int, 10 , , user_id, 0, bucket 1, user_id, 1, 2 .. . , BIGINT - , BIGINT. , , - . , user_id STRING, user_id 1, , 0. .

ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables

0

krishna.kadigari Jun 16 '15 at 18:20

source share

Maddy RS · Accepted Answer · 2015-06-17T15:58:39+0000

Yes, the number of files will be 32.
, ", " , .

, !

The bush is grouped into more than one column.

More articles: