Beehive - using NOT exists in using Semi Join

I need to use NOT IN query in Hive.

I have 3 tables A, B and C.

B with fields PRODUCT, ID and VALUE. C with field identifiers and VALUE.

I need to write rows from table B, which does not have the corresponding ID and VALUE fields in table C, in table A.

INSERT OVERWRITE TABLE A a SELECT * FROM B b LEFT SEMI JOIN C c ON (b.ID = c.ID AND b.VALUE = c.VALUE) where b.ID = NULL AND b.VALUE = NULL;

This suggestion from http://stackoverflow.com/questions/25041026/hive-left-semi-join-for-not-exists does not work, as I referenced the right table in the WHERE clause, which should not be done.

How to form an equivalent query without reference to the right table in the WHERE clause.

Any other solution?

0
hive
source share
2 answers

Decision:

Verify that the target tables have all the fields from both tables. Because it is used here *.

Then, It should be b.VALUE IS NUL L, not = NULL .

The request should look like this:

 INSERT OVERWRITE TABLE A a SELECT * FROM B b LEFT SEMI JOIN C c ON (b.ID = c.ID AND b.VALUE = c.VALUE) where b.ID IS NULL AND b.VALUE IS NULL; 
0
source share

The hive seems to support IN, not IN, EXIST and NOT EXISTS with 0.13

 Select A.Id,A.* From A Where EXISTS (Select 1 From B where A.ID = B.ID) 

Subqueries in EXIST and NOT EXISTS must have correlated predicates (for example, b.ID = a.ID in the above example). For more information, see the Hive Wiki> Subqueries in the WHERE Section

0
source share

All Articles