Running mapreduce job from eclipse

I wrote a mapreduce program in Java that I can send to a remote cluster operating in distributed mode. Currently, I will submit the task by following these steps:

  • export the mapreuce task as a jar (for example, myMRjob.jar )
  • send the job to the remote cluster using the following shell command: hadoop jar myMRjob.jar

I want to send a job directly from Eclipse when I try to run the program. How can i do this?

I am currently using CDH3, and a shortened version of my conf:

 conf.set("hbase.zookeeper.quorum", getZookeeperServers()); conf.set("fs.default.name","hdfs://namenode/"); conf.set("mapred.job.tracker", "jobtracker:jtPort"); Job job = new Job(conf, "COUNT ROWS"); job.setJarByClass(CountRows.class); // Set up Mapper TableMapReduceUtil.initTableMapperJob(inputTable, scan, CountRows.MyMapper.class, ImmutableBytesWritable.class, ImmutableBytesWritable.class, job); // Set up Reducer job.setReducerClass(CountRows.MyReducer.class); job.setNumReduceTasks(16); // Setup Overall Output job.setOutputFormatClass(MultiTableOutputFormat.class); job.submit(); 

When I start it directly from Eclipse, the work starts, but Hadoop cannot find markers / reducers. I get the following errors:

 12/06/27 23:23:29 INFO mapred.JobClient: map 0% reduce 0% 12/06/27 23:23:37 INFO mapred.JobClient: Task Id : attempt_201206152147_0645_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mypkg.mapreduce.CountRows$MyMapper at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) ... 

Does anyone know how to get through these errors? If I can fix this, I can integrate more MR tasks into my scripts, which would be great!

+7
source share
3 answers

If you submit a hadoop job from an Eclipse project that defines the classes for the job, you most likely have a problem with the class.

Calling job.setjarByClass(CountRows.class) finds the class file in the build path, not in CountRows.jar (which may or may not be built yet or even in the class path).

You must be able to say that this is true by printing the result of job.getJar() after calling job.setjarByClass(..) , and if it does not display the path to the jar file, then it will find the assembly class, not the jar 'd class

+8
source

What worked for me is to export an executable JAR (the difference between it and the JAR is that the first defines the class that has the main method) and the option to "pack the necessary libraries into the JAR" (selecting "extract" ... "leads to duplication of errors, and also extracts class files from jars, which, ultimately, in my case, did not eliminate the excluded class).

After that, you can simply install the jar, as suggested by Chris White. For Windows, it will look like this: job.setJar("C:\\\MyJar.jar");

If this helps someone, I made a presentation about what I learned from creating the MapReduce project and running it in Hadoop 2.2.0 on Windows 7 (in Eclipse Luna)

+2
source

I used this method from the following website to set up a Map / Reduce project in mine to run a project using Eclipse (without exporting the project as a JAR) Configuring Eclipse to run a Hadoop Map / Reduce project

Note. If you decide to debug your program, the Mapper and Reducer classes will not be debugged.

Hope this helps. :)

+1
source

All Articles