I have two tables in the hive / impala. I want to get data from a table into a spark like rdds and perform a join operation.
I do not want to pass the connection request directly in my hive context. This is just an example. I have more use cases that are not possible with standard HiveQL. How to get all rows, access columns and perform conversion.
Suppose I have two rdds:
val table1 = hiveContext.hql("select * from tem1")
val table2 = hiveContext.hql("select * from tem2")
I want to make a connection on rdds in a column named account_id
Ideally, I want to do something like this using rdds using a spark wrapper.
select * from tem1 join tem2 on tem1.account_id=tem2.account_id;
source
share