New MapReduce and Eclipse Architecture

Question

New MapReduce and Eclipse Architecture

Some basic re-factoring takes place Hadoop around MapReduce. Details of the same can be found in the JIRA below.

https://issues.apache.org/jira/browse/MAPREDUCE-279

It has the daemons ResourceManager, NodeManager and HistoryServer. Has anyone tried to run them in Eclipse? This would facilitate development and debugging.

I posted on the Hadoop forums and no one tried it. I just wanted to check if someone did something similar in stackoverflow.

+4

eclipse architecture hadoop mrv2

Praveen sripati Aug 15 '11 at 16:35

source share

2 answers

nourlcn · Answer 1 · 2011-12-24T02:47:38+0000

I have been trying to run YARN (next generation mapreduce) on my host for several days.

First, get the source code from apache.org using svn or git. take svn for example:

svn co https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23.0

then create eclipse-related files using maven (you must configure manve3 on your host before this step.)

 mvn test -DskipTests mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

and now you can import the existing maven project into eclipse (you must first configure the maven plugin in eclipse.)

In eclipse: File-> Import Existing Maven Projects

 Choose "Existing Projects into Workspace" Select the hadoop-mapreduce-project directory as the root directory Select the hadoop-mapreduce-project project Click "Finish"

I tried many times due to the incorrect setting of class_path / build_path and did not include the entire package / dependency class. Try “Add External Class Folder” and select the build directory of the current project. In the "Project Properties" section, you will encounter the same problem as me.

update: 2012-03-15

I could run YARN (same as Hadoop0.23) in eclipse now.

First, you must compile / build Yarn Successfully with the exec command:

 mvn clean package -Pdist -Dtar -DskipTests

Due to the fact that I only care about how to debug YARN, I run HDFS on my single host in linux terminal and not in eclipse.

 bin/hdfs namenode -formate -clusterid your_hdfs_id sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode

and then import hadoop 0.23 into eclipse and find resourcemanager.java, the next step is to run this class in eclipse. Detailed steps:

right click and select run as application
add a new configuration to run this class, in the part of the arguments fill in the contents:
- config your_yarn_conf_dir (same as HDFS conf)
press the start button, you will see the ResourceManager output in the eclipse console.

Running Nodemanaer in eclipse is similar to running Resourcemanager. Add a new configuration and populate argumemts with "--config your_yarn_conf_dir", then click on the start button.

Happy coding ~!

scrapcodes · Answer 2 · 2012-01-23T10:35:46+0000

Nourl Wait https://issues.apache.org/jira/browse/MAPREDUCE-3131 to complete. In any case, you can check the version and try to run it.

You will need mvn site: site to create a document that has all the documents. And to find out how? You can open debug.sh scripts and see for yourself.

Basically, we pass JAVA_OPTIONS and set the parameters for remote debugging eclipse. This becomes difficult for child processes, since you need to specify the mapred.child.java.opts property for this.

NTN

-P

New MapReduce and Eclipse Architecture

More articles: