Run a Hadoop job remotely

Question

Run a Hadoop job remotely

I am trying to run the MapReduce job from outside the cluster.

eg. Hadoop Cluster runs on Linux machines. We have one web application running on a Windows computer. We want to start hadoop from this remote web application. We want to get the hadoop output directory and present it as a graph.

We wrote the following code snippet:

Configuration conf = new Configuration(); Job job = new Job(conf); conf.set("mapred.job.tracker", "192.168.56.101:54311"); conf.set("fs.default.name", "hdfs://192.168.56.101:54310"); job.setJarByClass(Analysis.class) ; //job.setOutputKeyClass(Text.class); //job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); //job.set job.setInputFormatClass(CustomFileInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.waitForCompletion(true);

And this is the mistake we get. Even if we disconnect the hadoop 1.1.2 cluster, the error will remain the same.

 14/03/07 00:23:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/03/07 00:23:37 ERROR security.UserGroupInformation: PriviledgedActionException as:user cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-user\mapred\staging\user818037780\.staging to 0700 Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-user\mapred\staging\user818037780\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at LineCounter.main(LineCounter.java:86)

+7

hadoop

user1927808 Mar 6 '14 at 17:37

source share

1 answer

Tom sebastian · Answer 1 · 2014-03-07T07:24:33+0000

When working with a remote system, you must work as a remote user. You can do this in your main class as follows:

 public static void main(String a[]) { UserGroupInformation ugi = UserGroupInformation.createRemoteUser("root"); try { ugi.doAs(new PrivilegedExceptionAction<Void>() { public Void run() throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf); conf.set("hadoop.job.ugi", "root"); // write your remaining piece of code here. return null; } }); } catch (Exception e) { e.printStackTrace(); } }

Also, when sending a mapreduce job, it should copy your java classes with their dependent jars to the hadoop cluster, where it performs the mapreduce task. You can read more here .

So you need to create an executable jar of your code (with basic class analysis in your case) with all the dependent jar files specified in the classpath. Then run your jar file from the command line using

 java -jar job-jar-with-dependencies.jar arguments

NTN!

Run a Hadoop job remotely

More articles: