I think you're right. From http://hadoop.apache.org/docs/r1.0.4/releasenotes.html :
MapReduce-2365. New counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN). New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize. (Siddhart Set)
(Changes since Hadoop 0.20.2)
.................................................. .................................................. ...............................................
Here is a quote from Tom White's “Hadoop Definitive Guide”, 3rd edition (Table 8-2, p. 261):
"Display materialized bytes" - the number of output bytes of the card actually written to disk. If card compression is enabled, this is reflected in the counter value.
"Card output bytes" - the number of bytes of uncompressed output created by all cards in the task. Increments each time the collect() method is called on the OutputCollector map.
source share