Which key class is suitable for secondary sorting?

In Hadoop, you can use the secondary sorting mechanism to sort the values ​​before sending them to the reducer.

The way this is done in Hadoop is that you add a value to sort by key, and then you have some custom methods for comparing groups and keys that connect to the sort system.

Thus, you will need a key, which consists mainly of a real key and a value for sorting. To do this fast enough, I will need a way to create a composite key, which is also easy to decompose into separate parts needed for group and key comparison methods.

What is the smartest way to do this. Is there a “ready-made” Hadoop class that can help me with this, or do I need to create a separate key class for each step of the map reduction?

How to do this if the key is actually a composite, consisting of several parts (also necessary separately due to the separator)?

What do you guys recommend?

PS I wanted to add the "secondary-sort" tag, but I don’t have enough repetitions yet.

+5
source share
4 answers

, , , - . WritableComparable, compareTo . , .

0

All Articles