Column value for hive comma

It was asked and answered for SQL ( Converting multiple lines to one with a comma as a separator ) whether any of the mentioned approaches would work in Hive, for example. for this:

+------+------+ | Col1 | Col2 | +------+------+ | a | 1 | | a | 5 | | a | 6 | | b | 2 | | b | 6 | +------+------+ 

:

 +------+-------+ | Col1 | Col2 | +------+-------+ | a | 1,5,6 | | b | 2,6 | +------+-------+ 
+6
source share
3 answers

The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. Therefore, you can write a query like:

 SELECT Col1, collect_set(Col2) FROM your_table GROUP BY Col1; 

However, there is one striking difference between MySQL GROUP BY and Hive collect_set , while GROUP_CONCAT also stores duplicates in the resulting array, collect_set removes duplicates occurring in the array. In the example you showed, there are no duplicate group values ​​for Col2 , so you can use it and use it.

+14
source

And there is collect_list that will have a complete list (with duplicates).

+5
source

try it

 SELECT Col1, concat_ws(',', collect_set(Col2)) as col2 FROM your_table GROUP BY Col1; 

apache.org documentation

+3
source

All Articles