I have data arranged in directories in a specific format (shown below) and you want to add them to the hive table. I want to add all the data for the 2012 catalog. All names are below directory names, and the innermost dir (3rd level) has the actual data files. Is there a way to select data directly without changing this structure. Any pointers are appreciated.
/2012/ | |---------2012-01 |---------2012-01-01 |---------2012-01-02 |... |... |---------2012-01-31 | |---------2012-02 |---------2012-02-01 |---------2012-02-02 |... |... |---------2012-02-28 | |---------2012-03 |... |... |---------2012-12
Queries done so far with no luck:
CREATE EXTERNAL TABLE sampledata (datestr string, id string, locations string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/path/to/data/2012/*/*'; CREATE EXTERNAL TABLE sampledata (datestr string, id string, locations string) partitioned by (ystr string, ymstr string, ymdstr string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; ALTER TABLE sampledata ADD PARTITION (ystr ='2012') LOCATION '/path/to/data/2012/';
SOLUTION: This small parameter fixes my problem. Adding to the question where this might be useful to others:
SET mapred.input.dir.recursive=true;
hive partition
Yash sharma
source share