I am new to Spark. I tried searching, but I could not find the right solution. I installed hadoop 2.7.2 on two boxes (one master node and another working node). I installed the cluster by following the link http://javadev.org/docs/hadoop/centos/6/installation/multi-node-installation-on-centos-6-non-sucure-mode/ I used the suoop and spark application as root user to test the cluster.
I installed the spark on the master node and the spark starts without any errors. However, when I submit a job using spark submit, I get a File Not Found exception, even if the file is present in the master node in the same place in the error. I execute the Spark Submit command and please find the log output below the command.
/bin/spark-submit --class com.test.Engine --master yarn --deploy-mode cluster /app/spark-test.jar
04/16/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable
04/16/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
04/16/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers
04/16/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
04/16/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
04/16/21 19:16:14 INFO Client: Setting up container launch context for our AM
04/16/21 19:16:14 INFO Client: Setting up the launch environment for our AM container
04/16/21 19:16:14 INFO Client: Preparing resources for our AM container
04/16/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file: /mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
04/16/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file: /app/spark-test.jar
04/16/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file: / tmp / spark-120aeddc-0f87-4411-9400-22ba01096249 / __ spark_conf__5619348744221830008.zip
04/16/21 19:16:14 INFO SecurityManager: Changing view acls to: root
04/16/21 19:16:14 INFO SecurityManager: Changing modify acls to: root
04/16/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set (root); users with modify permissions: Set (root)
04/16/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager
04/16/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001
04/16/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
04/16/21 19:16:16 INFO Client:
client token: N / A
diagnostics: N / A
ApplicationMaster host: N / A
ApplicationMaster RPC port: -1
queue: default
start time: 1461246375622
final status: UNDEFINEDsparkcluster01.testing.com
tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/
user: root
04/16/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
04/16/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
04/16/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
04/16/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
04/16/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED)
04/16/21 19:16:21 INFO Client:
client token: N / A
diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page: http: //sparkcluster01.testing.com: 8088 / cluster / app / application_1461246306015_0001Then, click on links to logs of each attempt.
Diagnostics: java.io.FileNotFoundException: File file: /app/spark-test.jar does not exist
Failing this attempt. Failing the application.
ApplicationMaster host: N / A
ApplicationMaster RPC port: -1
queue: default
start time: 1461246375622
final status: FAILED
tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001
user: root
Exception in thread "main" org.ap / app / spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run (Client.scala: 1034)
at org.apache.spark.deploy.yarn.Client $ .main (Client.scala: 1081)
at org.apache.spark.deploy.yarn.Client.main (Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain (SparkSubmit.scala: 731)
at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1 (SparkSubmit.scala: 181)
at org.apache.spark.deploy.SparkSubmit $ .submit (SparkSubmit.scala: 206)
at org.apache.spark.deploy.SparkSubmit $ .main (SparkSubmit.scala: 121)
at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala)
I even tried to launch a spark in the HDFS file system by placing my application on HDFS and providing the HDFS path in the Spark Submit command. Even then its throwing File Not Found Exception in some Spark Conf files. I execute the Spark Submit command, and please find the log output under the command.
./bin/spark-submit --class com.test.Engine --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar
04/16/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
04/16/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers
04/16/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
04/16/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
04/16/21 18:11:46 INFO Client: Setting up container launch context for our AM
04/16/21 18:11:46 INFO Client: Setting up the launch environment for our AM container
04/16/21 18:11:46 INFO Client: Preparing resources for our AM container
04/16/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file: /mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
04/16/21 18:11:47 INFO Client: Uploading resource hdfs: //sparkcluster01.testing.com: 9000 / beacon / job / spark-test.jar -> file: /root/.sparkStaging/application_1461234217994_0017/spark- test.jar
04/16/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file: / tmp / spark-f4eef3ac-2add-42f8-a204-be7959c26f21 / __ spark_conf__6818051470272245610.zip
04/16/21 18:11:50 INFO SecurityManager: Changing view acls to: root
04/16/21 18:11:50 INFO SecurityManager: Changing modify acls to: root
04/16/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set (root); users with modify permissions: Set (root)
04/16/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager
04/16/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017
04/16/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
04/16/21 18:11:51 INFO Client:
client token: N / A
diagnostics: N / A
ApplicationMaster host: N / A
ApplicationMaster RPC port: -1
queue: default
start time: 1461242510849
final status: UNDEFINED
tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/
user: root
04/16/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
04/16/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
04/16/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED)
04/16/21 18:11:54 INFO Client:
client token: N / A
diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with exitCode: -1000
For more detailed output, check application tracking page: http: //sparkcluster01.testing.com: 8088 / cluster / app / application_1461234217994_0017Then, click on links to logs of each attempt.
Diagnostics: File file: / tmp / spark-f4eef3ac-2add-42f8-a204-be7959c26f21 / __ spark_conf__6818051470272245610.zip does not exist
java.io.FileNotFoundException: File file: / tmp / spark-f4eef3ac-2add-42f8-a204-be7959c26f21 / __ spark_conf__6818051470272245610.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus (RawLocalFileSystem.java:609)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal (RawLocalFileSystem.java:822)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus (RawLocalFileSystem.javaβ99)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus (FilterFileSystem.java:421)
at org.apache.hadoop.yarn.util.FSDownload.copy (FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access $ 000 (FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload $ 2.run (FSDownload.javahaps61)
at org.apache.hadoop.yarn.util.FSDownload $ 2.run (FSDownload.javahaps59)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call (FSDownload.javahaps58)
at org.apache.hadoop.yarn.util.FSDownload.call (FSDownload.java:62)
at java.util.concurrent.FutureTask.run (FutureTask.java:266)
at java.util.concurrent.Executors $ RunnableAdapter.call (Executors.javaβ11)
at java.util.concurrent.FutureTask.run (FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor $ Worker.run (ThreadPoolExecutor.java:617)
at java.lang.Thread.run (Thread.java:745)
Failing this attempt. Failing the application.
ApplicationMaster host: N / A
ApplicationMaster RPC port: -1
queue: default
start time: 1461242510849
final status: FAILED
tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017
user: root
Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status
at org.apache.spark.deploy.yarn.Client.run (Client.scala: 1034)
at org.apache.spark.deploy.yarn.Client $ .main (Client.scala: 1081)
at org.apache.spark.deploy.yarn.Client.main (Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain (SparkSubmit.scala: 731)
at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1 (SparkSubmit.scala: 181)
at org.apache.spark.deploy.SparkSubmit $ .submit (SparkSubmit.scala: 206)
at org.apache.spark.deploy.SparkSubmit $ .main (SparkSubmit.scala: 121)
at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala)
04/16/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called
04/16/21 18:11:55 INFO ShutdownHookManager: Deleting directory / tmp / spark-f4eef3ac-2add-42f8-a204-be7959c26f21
hadoop yarn apache-spark spark-streaming
Ajay
source share