Hadoop - create an external table from multiple directories in HDFS

I have an external table that reads data from the HDFS location (/ user / hive / storage / tableX) of all files and creates an external table in Hive.

Now suppose that some preliminary data partitions and all previous files are spread out in several directories with the conditional name definition <dir_name> _ <incNumber> for example.

/user/hive/warehouse/split/
  ./dir_1/files...
  ./dir_2/files...
  ./dir_n/files...

how can i create another external table that keeps track of all the files in the shared folder?

Do I need to create an external table that is divided into each subfolder (dir_x)?

Also, does this require some kind of bush or shell script that can create / add a section for each subdirectory?

+5
source share
2 answers

You need to create an external table, divided by dir_x, to access all the files in several folders.

CREATE external TABLE sample_table( col1 string,
                                    col2 string,
                                    col3 string,
                                    col4 string)
PARTITIONED BY (dir string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/hive/warehouse/split';

Then add the section as you would for a regular partition table

ALTER TABLE sample_table ADD PARTITION(dir='dir_1')
LOCATION '/user/hive/warehouse/split/dir_1';
ALTER TABLE sample_table ADD PARTITION(dir='dir_2')
LOCATION '/user/hive/warehouse/split/dir_2';

This approach will work. There is a problem with this approach. If for some time in the future you decide to add a new folder (for example, dir_100) to the path of the hive storage, you will have to drop and recreate the sample_table and add all the sections to the sample_table again using the ALTER TABLE expression. I have not worked with the hive for about 10 months, so I'm not sure if there is a better approach. If this is not a problem, you can use this approach.

+4

, , . ( ) . ( , ), , .

hive> MSCK REPAIR TABLE sample_table; 

- "sample_table".

0

All Articles