Loading a bush into a partitioned table

I have a log file in HDFS, the values ​​are separated by a comma. For example:

2012-10-11 12:00,opened_browser,userid111,deviceid222

Now I want to upload this file to the Hive table, which has the columns "timestamp", "action" and is divided into "userid", "deviceid". How can I ask Hive to take the last 2 columns in the log file as a table section? All examples eg "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require the definition of sections in the script, but I want the sections to be automatically configured from the HDFS file.

The only solution is to create an intermediate non-segmented table with all four columns, fill it out from the file, and then do INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; , but this is also an additional task, and we will have 2 very similar tables. Or should we create an external table as an intermediate table.

+8
hive loading
source share
3 answers

Ning Zhang has an excellent answer on this topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables .

A quick context is this:

  • Downloading data simply copies the data, it does not read them, so it cannot determine what to do with the partition
  • Suppose you first load data into a staging table (or using an external table that points to all files), and then allow you to dynamically insert partitions to load it into a partitioned table.
+12
source share

I worked with the same scenario, but instead we created separate HDFS data files for each partition that you need to load.

Since our data comes from the MapReduce job, we used MultipleOutputs in our Reducer class to multiplex the data into their corresponding section file. Subsequently, it is just a matter of creating a script using a section with the name of the HDFS file.

+1
source share
  • As mentioned in @ Denny Lee's answer, we need to include the staging table (invites_stg) managed or external, and then INSERT from the staging table to the partitioned table (invites in this case).

  • Make sure that we have the following two properties set: SET hive.exec.dynamoc.partition = true SET hive.exec.dynamic.partition.mode = non-line

  • And finally, insert the invitations, TABLE OF OBSERVING THE INSERT INDIA SECTION (STATE) SELECT COL FROM invites_stg;

See this link for help: http://www.edupristine.com/blog/hive-partitions-example

0
source share

All Articles