I just came up with a rather strange case of using this.
Sometimes I find that I need a cartographer who consumes all his input before making any conclusion. I have done this in the past by writing records in my cleanup function. My map function does not actually display any records, it just reads the input and saves everything that is needed in private structures.
Turns out this approach works fine if you don't produce a lot of output. The best thing I can make out is that the carter reset tool does not work during cleaning. Thus, the created records simply accumulate in the memory, and if there are too many of them, you risk losing the heap. This is my guess about what is happening - it may be wrong. But definitely the problem goes away with my new approach.
This new approach is to override the run () function instead of cleanup (). My only change for starting default () is that after the last record has been delivered to map (), I again call map () with a null key and value. This is a signal of my map () function to continue working and output it. In this case, when the spill tool is still running, memory usage remains under control.
Andy lowry
source share