How many abbreviations? (from official documentation)
The correct number of reductions seems to be 0.95 or 1.75 times the (number of nodes) * (number of maximum containers per node ).
From 0.95, all abbreviations can immediately start and start transferring card outputs as the display completes. With 1.75 faster nodes complete their first round of contractions and launch a second wave of contractions, doing a much better job of balancing the load.
Increasing the number of reductions increases the structure-based costs, but increases load balancing and reduces the cost of failures.
The scaling indices given above are slightly less than integers to reserve a few reduced slots in frames for speculative tasks and unsuccessful tasks.
This article also discusses the card counter.
How many cards?
The number of cards is usually determined by the total size of the inputs, i.e. the total number of blocks of input files.
The correct level of parallelism for maps seems to be around 10-100 maps per node , although it has been configured for 300 maps for tasks with a very compact display. Setting up a task takes some time, so itβs best if the cards run for at least a minute.
Thus, if you expect 10TB of input and have a block size of 128MB, you will get 82,000 cards if Configuration.set(MRJobConfig.NUM_MAPS, int) (which provides only a hint for the framework) is used to set it even higher.
If you want to change the default value 1 for the number of gears, you can set the (From hasoop 2.x version) property below as a command line parameter
mapreduce.job.reduces
OR
software can be installed using
job.setNumReduceTasks(integer_numer);
Take a look at another related SE question: What is the ideal number of gears on Hadoop?