Materialized Map Output and Card Output Bytes

In hadoop job counters, what is the difference between “Materialized map output” and “card output bytes”? I don’t see the first when I turn off the compression of the card output, so I think that these are real output bytes (compressed), and the last are uncompressed bytes?

+6
source share
1 answer

I think you're right. From http://hadoop.apache.org/docs/r1.0.4/releasenotes.html :

MapReduce-2365. New counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN). New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize. (Siddhart Set)

(Changes since Hadoop 0.20.2)

.................................................. .................................................. ...............................................

Here is a quote from Tom White's “Hadoop Definitive Guide”, 3rd edition (Table 8-2, p. 261):

"Display materialized bytes" - the number of output bytes of the card actually written to disk. If card compression is enabled, this is reflected in the counter value.

"Card output bytes" - the number of bytes of uncompressed output created by all cards in the task. Increments each time the collect() method is called on the OutputCollector map.

+10
source

All Articles