How to join tables in hbase

I need to join tables in Hbase.

I integrated HIVE and HBase and it works well. I can request the use of HIVE.

But can anyone help me how to join tables in HBase without using HIVE. I think using mapreduce, we can achieve this, if possible, someone has a working example that I can name.

Please share your opinions.

I have an approach. I.e

If I need JOIN Tables A x B x C; I can use TableMapReduceUtil to iterate over A, and then get the data from B and C inside the TableMapper. Then use TableReducer to write back to another table Y.

Would this approach be good.

+8
hbase mapreduce
source share
1 answer

This is definitely an approach, but if you do 2 random reads on a scanned line, then your speed will drop. If you are strongly filtering strings or have a small data set in A, this may not be a problem.

Merge Sort

However, the best approach that will be available in HBase 0.96 is the MultipleTableInput method. This means that he scans table A and writes it using a unique key that will allow him to map table B.

eg. The output of table A (b_id, a_info) and table B will emit (b_id, b_info) converging together in the gearbox.

This is an example of a sort-merge union.

Nested Loop Join

If you join a row key or a join attribute is sorted according to table B, you can have a scanner instance in each task that reads from table B sequentially until it finds what it is looking for.

eg. Table. Row key = "companyId" and table B row key = "companyId_employeeId". Then for each Company in Table A you can get all the employees using the algorithm using the nest-loop algorithm.

pseudo code:

for(company in TableA): for(employee in TableB): if employee.company_id == company.id: emit(company.id, employee) 

This is an example of a contour join.

More detailed connection algorithms are given here:

+12
source share

All Articles