What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run () function in Hadoop?

Question

What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run () function in Hadoop?

What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop? The setup() call is called before the map() call, and clean() is called after map() . The documentation for run() states

Expert users can override this method for more control over the execution of Mapper.

I am looking for the practical purpose of this function.

+8

function map hadoop

Praveen sripati 18 sept '11 at 6:00

source share

3 answers

I just came up with a rather strange case of using this.

Sometimes I find that I need a cartographer who consumes all his input before making any conclusion. I have done this in the past by writing records in my cleanup function. My map function does not actually display any records, it just reads the input and saves everything that is needed in private structures.

Turns out this approach works fine if you don't produce a lot of output. The best thing I can make out is that the carter reset tool does not work during cleaning. Thus, the created records simply accumulate in the memory, and if there are too many of them, you risk losing the heap. This is my guess about what is happening - it may be wrong. But definitely the problem goes away with my new approach.

This new approach is to override the run () function instead of cleanup (). My only change for starting default () is that after the last record has been delivered to map (), I again call map () with a null key and value. This is a signal of my map () function to continue working and output it. In this case, when the spill tool is still running, memory usage remains under control.

+1

Andy lowry Mar 27 '14 at 15:14

source share

Perhaps it can also be used for debugging purposes. Then you can skip part of the input key-value pairs (= take a sample) to test your code.

0

DDW Aug 28 '13 at 9:34

source share

Brian roach · Accepted Answer · 2011-09-18T06:18:40+0000

By default, the run() method simply takes each key / value pair provided by the context and calls the map() method:

 public void run(Context context) throws IOException, InterruptedException { setup(context); while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } cleanup(context); }

If you want to do more than that ... you need to redefine it. For example, the MultithreadedMapper class

What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run () function in Hadoop?

More articles: