In Hadoop, you can use the secondary sorting mechanism to sort the values before sending them to the reducer.
The way this is done in Hadoop is that you add a value to sort by key, and then you have some custom methods for comparing groups and keys that connect to the sort system.
Thus, you will need a key, which consists mainly of a real key and a value for sorting. To do this fast enough, I will need a way to create a composite key, which is also easy to decompose into separate parts needed for group and key comparison methods.
What is the smartest way to do this. Is there a “ready-made” Hadoop class that can help me with this, or do I need to create a separate key class for each step of the map reduction?
How to do this if the key is actually a composite, consisting of several parts (also necessary separately due to the separator)?
What do you guys recommend?
PS I wanted to add the "secondary-sort" tag, but I don’t have enough repetitions yet.
source
share