Spark 0.9.0: the worker continues to die offline when the operation fails

I'm new to the spark. I am running Spark offline on my mac. I bring up a master and a worker, and they all fit perfectly. The wizard log file is as follows:

... 14/02/25 18:52:43 INFO Slf4jLogger: Slf4jLogger started 14/02/25 18:52:43 INFO Remoting: Starting remoting 14/02/25 18:52:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:// sparkMaster@Shirishs-MacBook-Pro.local :7077] 14/02/25 18:52:43 INFO Master: Starting Spark master at spark://Shirishs-MacBook-Pro.local:7077 14/02/25 18:52:43 INFO MasterWebUI: Started Master web UI at http://192.168.1.106:8080 14/02/25 18:52:43 INFO Master: I have been elected leader! New state: ALIVE 14/02/25 18:53:03 INFO Master: Registering worker Shirishs-MacBook-Pro.local:53956 with 4 cores, 15.0 GB RAM 

The working log is as follows:

 14/02/25 18:53:02 INFO Slf4jLogger: Slf4jLogger started 14/02/25 18:53:02 INFO Remoting: Starting remoting 14/02/25 18:53:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:// sparkWorker@192.168.1.106 :53956] 14/02/25 18:53:02 INFO Worker: Starting Spark worker 192.168.1.106:53956 with 4 cores, 15.0 GB RAM 14/02/25 18:53:02 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating 14/02/25 18:53:02 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081 14/02/25 18:53:02 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077... 14/02/25 18:53:03 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077 

Now, when I submit the task, the task is not executed (because the class was not found by error), but the worker also dies. Here is the main magazine:

 14/02/25 18:55:52 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper 14/02/25 18:55:52 INFO Master: Launching driver driver-20140225185552-0000 on worker worker-20140225185302-192.168.1.106-53956 14/02/25 18:55:55 INFO Master: Registering worker Shirishs-MacBook-Pro.local:53956 with 4 cores, 15.0 GB RAM 14/02/25 18:55:55 INFO Master: Attempted to re-register worker at same address: akka.tcp:// sparkWorker@192.168.1.106 :53956 14/02/25 18:55:55 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:55:57 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.1.106%3A53962-2#-21389169] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 4/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:55:57 ERROR EndpointWriter: AssociationError [akka.tcp:// sparkMaster@Shirishs-MacBook-Pro.local :7077] -> [akka.tcp:// driverClient@192.168.1.106 :53961]: Error [Association failed with [akka.tcp:// driverClient@192.168.1.106 :53961]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp:// driverClient@192.168.1.106 :53961] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:53961 ] ... ... 14/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:56:03 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:10 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:18 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:25 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:33 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:40 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/ 

The working journal is as follows

 14/02/25 18:55:52 INFO Worker: Asked to launch driver driver-20140225185552-0000 2014-02-25 18:55:52.534 java[11415:330b] Unable to load realm info from SCDynamicStore 14/02/25 18:55:52 INFO DriverRunner: Copying user jar file:/Users/shirish_kumar/Developer/spark_app/SimpleApp to /Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140225185552-0000/SimpleApp 14/02/25 18:55:53 INFO DriverRunner: Launch Command: "/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/bin/java" "-cp" ":/Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140225185552-0000/SimpleApp:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/conf:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp:// sparkWorker@192.168.1.106 :53956/user/Worker" "SimpleApp" 14/02/25 18:55:55 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/02/25 18:55:55 INFO Worker: Starting Spark worker 192.168.1.106:53956 with 4 cores, 15.0 GB RAM 14/02/25 18:55:55 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating 14/02/25 18:55:55 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081 14/02/25 18:55:55 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077... 14/02/25 18:55:55 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077 

After that in webUI - the working show is dead.

My question is: has anyone encountered this problem. A worker must not die if work has failed.

+6
source share
2 answers

Check / Spark / work folder. You can see the exact error for this particular driver.

For me, this class is not found exception.Just give the full class name for the main class of the application (also include the package name).

Then clear the working directory and run the application again offline again. It will work ...!

0
source

You must specify the path to your JAR files.

Pragmatically, you can do it like this:

 sparkConf.set("spark.jars", "file:/myjar1, file:/myjarN") 

This means that you must first compile the JAR file.

You also need to link dependent JARs, of which there are several automation methods, but not the full scope of this issue.

0
source

All Articles