The combiner starts after Mapper and before Reducer, it will take as input all the data emitted by the Mapper instances in the given node. Then it outputs the output to the gearboxes. Thus, combiner input records should be smaller than cards.
12/08/29 13:38:49 INFO mapred.JobClient: Map-Reduce Framework 12/08/29 13:38:49 INFO mapred.JobClient: Reduce input groups=8649 12/08/29 13:38:49 INFO mapred.JobClient: Map output materialized bytes=306210 12/08/29 13:38:49 INFO mapred.JobClient: Combine output records=859412 12/08/29 13:38:49 INFO mapred.JobClient: Map input records=457272 12/08/29 13:38:49 INFO mapred.JobClient: Reduce shuffle bytes=0 12/08/29 13:38:49 INFO mapred.JobClient: Reduce output records=8649 12/08/29 13:38:49 INFO mapred.JobClient: Spilled Records=1632334 12/08/29 13:38:49 INFO mapred.JobClient: Map output bytes=331837344 12/08/29 13:38:49 INFO mapred.JobClient: **Combine input records=26154506** 12/08/29 13:38:49 INFO mapred.JobClient: **Map output records=25312392** 12/08/29 13:38:49 INFO mapred.JobClient: SPLIT_RAW_BYTES=218 12/08/29 13:38:49 INFO mapred.JobClient: Reduce input records=17298
source share