Hive Explode / Lateral View Multiple Arrays

I have a hive table with the following schema:

COOKIE | PRODUCT_ID | CAT_ID | COL
1234123 [1,2,3] [r, t, null] [2,1, null]

How can I normalize arrays to get the following result

COOKIE | PRODUCT_ID | CAT_ID | COL

1234123 [1] [r] [2]

1234123 [2] [t] [1]

1234123 [3] null null

I tried the following:

select concat_ws('|',visid_high,visid_low) as cookie ,pid ,catid ,qty from table lateral view explode(productid) ptable as pid lateral view explode(catalogId) ptable2 as catid lateral view explode(qty) ptable3 as qty 

however, the result is deduced as a Cartesian product.

+8
hive explode hiveql
source share
2 answers

You can use UDF numeric_range and array_index from Brickhouse ( http://github.com/klout/brickhouse ) to solve this problem. There is an informative blog entry described in detail at http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/

Using these UDFs, the request will look like

 select cookie, array_index( product_id_arr, n ) as product_id, array_index( catalog_id_arr, n ) as catalog_id, array_index( qty_id_arr, n ) as qty from table lateral view numeric_range( size( product_id_arr )) n1 as n; 
+12
source share

I found a very good solution to this problem without using UDF, posexplode is a very good solution:

  SELECT COOKIE,
 ePRODUCT_ID,
 eCAT_ID
 eQTY
 FROM TABLE 
 LATERAL VIEW posexplode (PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID
 LATERAL VIEW posexplode (CAT_ID) eCAT_ID AS seqc, eCAT_ID
 LATERAL VIEW posexplode (QTY) eQTY AS seqq, eDateReported
 WHERE seqp = seqc AND seqc = seqq; 
+8
source share

All Articles