I understand that the cartographer produces 1 section per gearbox. How does the gearbox know which section to copy? Suppose there are 2 nodes using mapper for a word count program, and there are 2 reducers. If each node card creates 2 partitions, with the possibility of separation in both nodes containing the same word as a key, how will the reducer work correctly?
For ex:
If node 1 creates sections 1 and section 2, and section 1 contains a key named "WHO".
If node 2 creates sections 3 and section 4, and section 3 contains a key named "WHO".
If section 1 and section 4 went to gear 1 (and stayed in gear 2), how does gear 1 calculate the correct number of words?
If this is not possible, and sections 1 and 3 will be made to go to gear 1, how does Hadoop do it? Does he certify that a given key-value pair from different nodes always goes to the same gearbox? If so, how is this done?
Thanks, Suresh.
source share