New MapReduce and Eclipse Architecture

Some basic re-factoring takes place Hadoop around MapReduce. Details of the same can be found in the JIRA below.

https://issues.apache.org/jira/browse/MAPREDUCE-279

It has the daemons ResourceManager, NodeManager and HistoryServer. Has anyone tried to run them in Eclipse? This would facilitate development and debugging.

I posted on the Hadoop forums and no one tried it. I just wanted to check if someone did something similar in stackoverflow.

+4
source share
2 answers

I have been trying to run YARN (next generation mapreduce) on my host for several days.

First, get the source code from apache.org using svn or git. take svn for example:

svn co https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23.0 

then create eclipse-related files using maven (you must configure manve3 on your host before this step.)

 mvn test -DskipTests mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true 

and now you can import the existing maven project into eclipse (you must first configure the maven plugin in eclipse.)

In eclipse: File-> Import Existing Maven Projects

 Choose "Existing Projects into Workspace" Select the hadoop-mapreduce-project directory as the root directory Select the hadoop-mapreduce-project project Click "Finish" 

I tried many times due to the incorrect setting of class_path / build_path and did not include the entire package / dependency class. Try โ€œAdd External Class Folderโ€ and select the build directory of the current project. In the "Project Properties" section, you will encounter the same problem as me.


update: 2012-03-15

I could run YARN (same as Hadoop0.23) in eclipse now.

First, you must compile / build Yarn Successfully with the exec command:

 mvn clean package -Pdist -Dtar -DskipTests 

Due to the fact that I only care about how to debug YARN, I run HDFS on my single host in linux terminal and not in eclipse.

 bin/hdfs namenode -formate -clusterid your_hdfs_id sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode 

and then import hadoop 0.23 into eclipse and find resourcemanager.java, the next step is to run this class in eclipse. Detailed steps:

  • right click and select run as application
  • add a new configuration to run this class, in the part of the arguments fill in the contents:

    - config your_yarn_conf_dir (same as HDFS conf)

  • press the start button, you will see the ResourceManager output in the eclipse console.

Running Nodemanaer in eclipse is similar to running Resourcemanager. Add a new configuration and populate argumemts with "--config your_yarn_conf_dir", then click on the start button.

Happy coding ~!

0
source

Nourl Wait https://issues.apache.org/jira/browse/MAPREDUCE-3131 to complete. In any case, you can check the version and try to run it.

You will need mvn site: site to create a document that has all the documents. And to find out how? You can open debug.sh scripts and see for yourself.

Basically, we pass JAVA_OPTIONS and set the parameters for remote debugging eclipse. This becomes difficult for child processes, since you need to specify the mapred.child.java.opts property for this.

NTN

-P

0
source

All Articles