Running a map reduces performance like another user

I have a web application that interacts with Hadoop. (Cloudera cdh3u6) A separate user operation should start a new task to reduce the card in the cluster.

The cluster is not a secure cluster, but uses simple group authentication - so if I do this myself, I can run MR jobs from the command line.

In a web application, I use ToolRunner to do my job:

 MyMapReduceWrapperClass mr = new MyMapReduceWrapperClass(); ToolRunner.run(mr, null); // inside the run implementation of my wrapper class : Job job = new Job(conf, "job title"); //set up stuff removed job.submit(); 

Currently, this work is presented as the user who started the web application server process (Tomcat), and this user is a special local account on this web server that does not have permission to send tasks to the cluster.

Ideally, I would like to get a personal identification from the user and pass it on, so that when different users interact with the web application / service, we can see who refers to which tasks. Skipping questions about how to actually coordinate these credentials, I don’t even understand where it will go.

I see that on Job I have the getCredentials() parameter, but from reading that there is a / Kerberos token, I got the impression that this is for protected clusters (which, I think, not) - not to mention, I I don’t think Kerberos is installed on my web server. This can be fixed. But it also sounds like the alleged precedent is to add secrets that may be required when cutting a map job while working to access other services, and not about completing a task like someone else.

I also see that in the class (older?) JobConf I have the option setUser(String name) , which seems promising - although I don’t know where it will require a password or something like that, but I can not find much information or documentation for this feature. I tried this and it didn’t affect - the work was still presented as a Tomcat user.

Are there other ways to study or research? I'm not talking about keywords for Google. I would prefer that I did not have the option “Just give tomcat user rights to the user to the cluster” - I do not manage this asset, and I do not expect this request to fly. If, however, this is my only option, I would like to understand why this is so that I can argue the need, having the correct information.

+4
source share
1 answer

You can use the UserGroupInformation class as follows:

 UserGroupInformation ugi = UserGroupInformation.createRemoteUser(username); ugi.doAs(new PrivilegedExceptionAction<MyMapReduceWrapperClass>() { public Object run() throws Exception { MyMapReduceWrapperClass mr = new MyMapReduceWrapperClass(); ToolRunner.run(mr, null); return mr; } }); 
+4
source

All Articles