Debugging hadoop in eclipse

Can I debug Hadoop source code in Eclipse? I do not ask about the tasks of reducing the map. I want to see how much of the Hadoop source code is responsible for planning the work to reduce the map and its operation. Is there any mechanism by which this can be done?

+2
debugging mapreduce hadoop
source share
2 answers

You can download the Hadoop project and integrate it into your eclipse, and also use F5 or F6 for debugging. You have another debugging mode in eclipse:

  • F5: step-by-step debugging
  • F6: Skips loops and routines
  • F7: skips a loop or routine and returns to the last cursor point.
  • F8: execute and exit debugging

Or you can try to understand the workflow yourself, following step by step, you can start with your run() method in your main form.

To answer your question: who performs map task planning?

As you can see in this diagram, the files are divided into the InputFormat class into parts of a fixed size called InputSplits. Each split is then assigned to the cartographer, which is the node to which the map task has been assigned.

The same InputFormat class also provides the RecordReader , which is responsible for parsing and splitting records. Each record is transferred to the card function as a pair (key, value). So the Mapper class is the one who calls the map methods.

The following is the workflow of the wordcount example:

enter image description here

Where FileInputFormat is an abstract class that extends the abstract class InputFormat , and TextInputFormat extends the class FileInputFormat .

+1
source share

Here are instructions from the Apache Hadoop documentation. I have not tried them, but the instructions are good enough to get started.

0
source share

All Articles