Number of default gearboxes

In Hadoop, if we have not set the number of gears, how many gears will be created?

Like the number of cartographers depends on (total data size) / (input separation size) , for example. if the data size is 1 TB and the shared input size is 100 MB. Then the number of cartographers will be (1000 * 1000) / 100 = 10000 (Ten thousand).

The number of gearboxes depends on what factors? How many gearboxes are designed to work?

+6
source share
2 answers

How many abbreviations? (from official documentation)

The correct number of reductions seems to be 0.95 or 1.75 times the (number of nodes) * (number of maximum containers per node ).

From 0.95, all abbreviations can immediately start and start transferring card outputs as the display completes. With 1.75 faster nodes complete their first round of contractions and launch a second wave of contractions, doing a much better job of balancing the load.

Increasing the number of reductions increases the structure-based costs, but increases load balancing and reduces the cost of failures.

The scaling indices given above are slightly less than integers to reserve a few reduced slots in frames for speculative tasks and unsuccessful tasks.

This article also discusses the card counter.

How many cards?

The number of cards is usually determined by the total size of the inputs, i.e. the total number of blocks of input files.

The correct level of parallelism for maps seems to be around 10-100 maps per node , although it has been configured for 300 maps for tasks with a very compact display. Setting up a task takes some time, so it’s best if the cards run for at least a minute.

Thus, if you expect 10TB of input and have a block size of 128MB, you will get 82,000 cards if Configuration.set(MRJobConfig.NUM_MAPS, int) (which provides only a hint for the framework) is used to set it even higher.

If you want to change the default value 1 for the number of gears, you can set the (From hasoop 2.x version) property below as a command line parameter

mapreduce.job.reduces

OR

software can be installed using

 job.setNumReduceTasks(integer_numer); 

Take a look at another related SE question: What is the ideal number of gears on Hadoop?

+8
source

By default, the number of gearboxes is 1.

You can change it by adding a parameter

mapred.reduce.tasks on the command line or in the driver code or in the conf file that you pass.

for example: Command line argument: bin/hadoop jar ... -Dmapred.reduce.tasks=<num reduce tasks> or, in the driver code as: conf.setNumReduceTasks(int num);

Recommended reading: https://wiki.apache.org/hadoop/HowManyMapsAndReduces

+4
source

All Articles