Column value for hive comma

Question

Column value for hive comma

It was asked and answered for SQL ( Converting multiple lines to one with a comma as a separator ) whether any of the mentioned approaches would work in Hive, for example. for this:

+------+------+ | Col1 | Col2 | +------+------+ | a | 1 | | a | 5 | | a | 6 | | b | 2 | | b | 6 | +------+------+

:

 +------+-------+ | Col1 | Col2 | +------+-------+ | a | 1,5,6 | | b | 2,6 | +------+-------+

+6

hadoop hive

glp Mar 28 '14 at 6:05

source share

3 answers

And there is collect_list that will have a complete list (with duplicates).

+5

Simon u Dec 30 '15 at 14:33

source share

try it

 SELECT Col1, concat_ws(',', collect_set(Col2)) as col2 FROM your_table GROUP BY Col1;

apache.org documentation

+3

Anil Aug 18 '17 at 21:00

source share

Neels · Accepted Answer · 2014-03-28T10:03:46+0000

The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. Therefore, you can write a query like:

 SELECT Col1, collect_set(Col2) FROM your_table GROUP BY Col1;

However, there is one striking difference between MySQL GROUP BY and Hive collect_set , while GROUP_CONCAT also stores duplicates in the resulting array, collect_set removes duplicates occurring in the array. In the example you showed, there are no duplicate group values for Col2 , so you can use it and use it.

Column value for hive comma

More articles: