Hadoop: difference between 0 gear and identity gear?

Question

Hadoop: difference between 0 gear and identity gear?

I'm just trying to confirm my understanding of the difference between gear 0 and identity gear.

0 gear means that the reduction step will be skipped and the card output will be final.
Means of reducing identity, then shuffling / sorting will still take place?

+23

mapreduce hadoop

kee May 17 '12 at 5:44

source share

4 answers

Another use case for Identity Reducer is to combine all the results into output files. This can be convenient if you use Amazon Web Services to write directly in S3, especially if the output of the map is too small (e.g. grep / search for record) and you have many maps (e.g. 1000).

+4

Dolan Antenucci Jul 05 2018-12-12T00:

source share

It depends on your business requirements. If you are doing a wordcount, you should reduce your output of the card to get a general result. If you just want to change the words to uppercase, you do not need to reduce.

+3

nice2mu May 17 '12 at 8:17

source share

The main difference between “No gear” (mapred.reduce.tasks = 0) and “Standard gear”, which is IdentityReducer (mapred.reduce.tasks = 1, etc.), is when you use “No gear”, no partitions and amplifiers, mixing processes after the MAP stage. Therefore, in this case, you will get a “clean” output from your cards without further processing. This helps in the development and debugging of puppies, but not only.

+3

morsik Feb 11 '14 at 7:06

source share

David Gruzman · Accepted Answer · 2012-05-17 08:35

You understand correctly. I would define it as follows: If you do not need to sort the results of the map, you set 0, and the task is called only the map.
If you need to sort the results of the comparison, but do not need any aggregation, you choose an identity reducer.
And to complete the picture, we have a third case: we need aggregation, and in this case we need a gearbox.

Hadoop: difference between 0 gear and identity gear?

More articles: