I wrote a mapreduce program in Java that I can send to a remote cluster operating in distributed mode. Currently, I will submit the task by following these steps:
- export the mapreuce task as a jar (for example,
myMRjob.jar ) - send the job to the remote cluster using the following shell command:
hadoop jar myMRjob.jar
I want to send a job directly from Eclipse when I try to run the program. How can i do this?
I am currently using CDH3, and a shortened version of my conf:
conf.set("hbase.zookeeper.quorum", getZookeeperServers()); conf.set("fs.default.name","hdfs://namenode/"); conf.set("mapred.job.tracker", "jobtracker:jtPort"); Job job = new Job(conf, "COUNT ROWS"); job.setJarByClass(CountRows.class);
When I start it directly from Eclipse, the work starts, but Hadoop cannot find markers / reducers. I get the following errors:
12/06/27 23:23:29 INFO mapred.JobClient: map 0% reduce 0% 12/06/27 23:23:37 INFO mapred.JobClient: Task Id : attempt_201206152147_0645_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mypkg.mapreduce.CountRows$MyMapper at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) ...
Does anyone know how to get through these errors? If I can fix this, I can integrate more MR tasks into my scripts, which would be great!
Tucker
source share