I have a hive pivot table. It has 4 buckets.
CREATE TABLE user(user_id BIGINT, firstname STRING, lastname STRING)
COMMENT 'A bucketed copy of user_info'
CLUSTERED BY(user_id) INTO 4 BUCKETS;
First, I inserted some records into this table using the following query.
set hive.enforce.bucketing = true;
insert into user
select * from second_user;
After this operation In HDFS, I see that 4 files are created in this table.
Again I needed to insert another dataset into the user table. So I fulfilled the following request.
set hive.enforce.bucketing = true;
insert into user
select * from third_user;
Now 4 more files are added under the user folder directory. Now it has only 8 files.
Is it great to do this a few plugins in a table in square brackets? Does this affect table balancing?
sunil source
share