I have a log file in HDFS, the values ββare separated by a comma. For example:
2012-10-11 12:00,opened_browser,userid111,deviceid222
Now I want to upload this file to the Hive table, which has the columns "timestamp", "action" and is divided into "userid", "deviceid". How can I ask Hive to take the last 2 columns in the log file as a table section? All examples eg "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require the definition of sections in the script, but I want the sections to be automatically configured from the HDFS file.
The only solution is to create an intermediate non-segmented table with all four columns, fill it out from the file, and then do INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; , but this is also an additional task, and we will have 2 very similar tables. Or should we create an external table as an intermediate table.
hive loading
Valery Yesypenko
source share