Problems loading Spark chips

I am trying to run a simple Map / Reduce java program using a spark over yarn (Cloudera Hadoop 5.2 on CentOS). I tried it differently. The first way is as follows:

YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/; /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --jars /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar simplemr.jar 

This method gives the following error:

Diagnostics: Application application_1434177111261_0007 failed 2 times due to AM Container for appattempt_1434177111261_0007_000002 exited with exitCode: -1000 due to: HDFS resource: // kc1ltcld29: 9000 / user / MyUser / .sparkStaging / application_1434177111261 hadoop2.4.0.jar changed on src file system (expected 1434549639128, was 1434549642191

Then I tried without -jars:

 YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/; /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster simplemr.jar 

Diagnostics: Application application_1434177111261_0008 failed 2 times due to AM Container for appattempt_1434177111261_0008_000002 exited with exitCode: -1000 because: File does not exist: HDFS: // kc1ltcld29: 9000 / user / MyUser / .sparkStaging / application_14341771161 .0-hadoop2.4.0.jar. Make this attempt. Application crash. ApplicationMaster host: N / A ApplicationMaster RPC port: -1 queue: root.myuser start time: 1434549879649 final status: FAILED Tracking URL: http: // kc1ltcld29: 8088 / cluster / app / application_1434177111261_0008 user: myuser Exception in thread "main" org.apache.spark.SparkException: Application application_1434177111261_0008 ended with unsuccessful status on org.apache.spark.deploy.yarn.Client.run (Client.scala: 841) at org.apache.spark.deploy.yarn.Client $ .main (Client.scala: 867) at org.apache.spark.deploy.yarn.Client.main (Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0 (native method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java : 57) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorIm pl.java:43) at java.lang.reflect.Method.invoke (Method.java:601) at org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain (SparkSubmit. scala: 664) at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1 (SparkSubmit.scala: 169) at org.apache.spark.deploy.SparkSubmit $ .submit (SparkSubmit.scala: 192) at org.apache. spark.deploy.SparkSubmit $ .main (SparkSubmit.scala: 111) at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala) 06/15/17 10:04:57 INFO util.Utils: Shutdown 15 / 06/17 10:04:57 INFO util.Utils: Removing the / TMP / spark directory 2aca3f35-abf1-4e21-A10E-4778a039d0f4

I tried to remove all .jars from hdfs: //users//.sparkStaging and resubmit, but that did not help.

+3
source share
2 answers

The problem was solved by copying spark-assembly.jar to the hdfs directory for each node, and then passed to the original value of spark-submit --conf spark.yarn.jar as a parameter. Commands are listed below:

 hdfs dfs -copyFromLocal /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar /user/spark/spark-assembly.jar /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar simplemr.jar 
+3
source

if you get this error, it means that you load prefabricated banks using the -jars option or manually copy to hdfs in each node. I followed this approach and it works for me.

In yarn-cluster mode, Spark submit automatically loads the assembly into a distributed cache, from which all executing containers are read, so there is no need to manually copy the assembly to all nodes (or pass it through -jars). It seems your HDFS has two versions of the same jar.

Try to remove all old banks from your .sparkStaging directory and try again, it should work.

+2
source

All Articles