Merge delta data into an external table using the merge merge operator

I have an external table displayed in Hive (v2.3.2 on EMR-5.11.0), which I need to update with new data about once a week. A merge consists of an upsert conditional.

The table location is in s3, and there is always data (created once, and we just need to update it with new data).

I read this blog about data merging in Hive using the ACID function in transactional tables ( https://dzone.com/articles/update-hive-tables-the-easy-way-part-2-hortonworks ), but as much as possible I see, the only solution is to copy my external table into a temporary internal Hive table, clustered and transactional, then only in this table I can merge and redefine the original data with the new merged one.

This table is quite large (about 10 GB of data), so I would like to avoid copying it before each merge operation.

Is there a way to create an internal table and map it to existing data? or is there another way, besides the merge operator, to reload external Hive tables?

Many thanks!

+7
hadoop emr hive acid orc
source share

No one has answered this question yet.

See similar questions:

one
Hive: the best way to do incremental updates on the main table

or similar:

184
What does it mean that MongoDB does not meet ACID requirements?
one
Insert Insert request on EMR only works for more than 17 hours
one
Clear AWS EMR for reuse
one
How to insert direct values ​​into a beehive table?
one
Failed to update Hive transaction table.
one
The best storage format for backing up the internal hive table
one
External HIVE table - set blank lines to NULL
0
Problems with Apache Kylin and Hive-JDBC-Storage-Handler when building Cube
0
How can I update rows in an external Hive table when ACID properties are disabled?
0
Ignore subdirectory in external table in Apache Hive

All Articles