Shuffling and sorting for mapreduce

Question

Shuffling and sorting for mapreduce

I read the final guide and some other links on the Internet, including here

My question

Where does shuffling and sorting take place?

In accordance with my understanding, they occur both on the cards and on the gearboxes. But some links mention that shuffling occurs on cartographers and sorting on gearboxes.

Can someone confirm if I understood correctly? if not, can they provide additional documentation that I can go through?

+6

mapreduce hadoop

red 18 sept. '16 at 21:01

source share

1 answer

mrsrinivas · Answer 1 · 2016-09-19T05:40:16+0000

In random order:

MapReduce ensures that the input for each gearbox will be sorted by key. The process by which the system sorts and transfers map outputs to the reducers as inputs is called shuffling.

Sorting:

Sorting occurs at different stages of the MapReduce program, so it can exist at the stages of Map and Reduce.

Please take a look at this chart. enter image description here

Adding a larger description to the previous image in the Map and Zoom out steps.

Card Side:

When the card function begins to output, it is not just written to disk. Before the card divides the data into partitions corresponding to the reducers , it writes the stream to disk first divides the data into partitions corresponding to the reducers , to which they will ultimately be sent. Within each section, background thread performs an in-memory sort by key .

Side down:

When all the card outputs have been copied, the reduction task goes into the sorting phase (which should be correctly called the merge phase, since sorting was performed on the side of the card), which combines the card outputs, supporting their sorting. This will be done in rounds.

Source: Hadoop Ultimate Guide.

Shuffling and sorting for mapreduce

More articles: