I'm new to the spark. I am running Spark offline on my mac. I bring up a master and a worker, and they all fit perfectly. The wizard log file is as follows:
... 14/02/25 18:52:43 INFO Slf4jLogger: Slf4jLogger started 14/02/25 18:52:43 INFO Remoting: Starting remoting 14/02/25 18:52:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:// sparkMaster@Shirishs-MacBook-Pro.local :7077] 14/02/25 18:52:43 INFO Master: Starting Spark master at spark://Shirishs-MacBook-Pro.local:7077 14/02/25 18:52:43 INFO MasterWebUI: Started Master web UI at http://192.168.1.106:8080 14/02/25 18:52:43 INFO Master: I have been elected leader! New state: ALIVE 14/02/25 18:53:03 INFO Master: Registering worker Shirishs-MacBook-Pro.local:53956 with 4 cores, 15.0 GB RAM
The working log is as follows:
14/02/25 18:53:02 INFO Slf4jLogger: Slf4jLogger started 14/02/25 18:53:02 INFO Remoting: Starting remoting 14/02/25 18:53:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:// sparkWorker@192.168.1.106 :53956] 14/02/25 18:53:02 INFO Worker: Starting Spark worker 192.168.1.106:53956 with 4 cores, 15.0 GB RAM 14/02/25 18:53:02 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating 14/02/25 18:53:02 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081 14/02/25 18:53:02 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077... 14/02/25 18:53:03 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077
Now, when I submit the task, the task is not executed (because the class was not found by error), but the worker also dies. Here is the main magazine:
14/02/25 18:55:52 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper 14/02/25 18:55:52 INFO Master: Launching driver driver-20140225185552-0000 on worker worker-20140225185302-192.168.1.106-53956 14/02/25 18:55:55 INFO Master: Registering worker Shirishs-MacBook-Pro.local:53956 with 4 cores, 15.0 GB RAM 14/02/25 18:55:55 INFO Master: Attempted to re-register worker at same address: akka.tcp:// sparkWorker@192.168.1.106 :53956 14/02/25 18:55:55 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:55:57 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.1.106%3A53962-2#-21389169] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 4/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:55:57 ERROR EndpointWriter: AssociationError [akka.tcp:// sparkMaster@Shirishs-MacBook-Pro.local :7077] -> [akka.tcp:// driverClient@192.168.1.106 :53961]: Error [Association failed with [akka.tcp:// driverClient@192.168.1.106 :53961]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp:// driverClient@192.168.1.106 :53961] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /192.168.1.106:53961 ] ... ... 14/02/25 18:55:57 INFO Master: akka.tcp:// driverClient@192.168.1.106 :53961 got disassociated, removing it. 14/02/25 18:56:03 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:10 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:18 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:25 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:33 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/02/25 18:56:40 WARN Master: Got heartbeat from unregistered worker worker-20140225185555-192.168.1.106-53956 14/
The working journal is as follows
14/02/25 18:55:52 INFO Worker: Asked to launch driver driver-20140225185552-0000 2014-02-25 18:55:52.534 java[11415:330b] Unable to load realm info from SCDynamicStore 14/02/25 18:55:52 INFO DriverRunner: Copying user jar file:/Users/shirish_kumar/Developer/spark_app/SimpleApp to /Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140225185552-0000/SimpleApp 14/02/25 18:55:53 INFO DriverRunner: Launch Command: "/Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/bin/java" "-cp" ":/Users/shirish_kumar/Developer/spark-0.9.0-incubating/work/driver-20140225185552-0000/SimpleApp:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/conf:/Users/shirish_kumar/Developer/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar" "-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper" "akka.tcp:// sparkWorker@192.168.1.106 :53956/user/Worker" "SimpleApp" 14/02/25 18:55:55 ERROR OneForOneStrategy: FAILED (of class scala.Enumeration$Val) scala.MatchError: FAILED (of class scala.Enumeration$Val) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/02/25 18:55:55 INFO Worker: Starting Spark worker 192.168.1.106:53956 with 4 cores, 15.0 GB RAM 14/02/25 18:55:55 INFO Worker: Spark home: /Users/shirish_kumar/Developer/spark-0.9.0-incubating 14/02/25 18:55:55 INFO WorkerWebUI: Started Worker web UI at http://192.168.1.106:8081 14/02/25 18:55:55 INFO Worker: Connecting to master spark://Shirishs-MacBook-Pro.local:7077... 14/02/25 18:55:55 INFO Worker: Successfully registered with master spark://Shirishs-MacBook-Pro.local:7077
After that in webUI - the working show is dead.
My question is: has anyone encountered this problem. A worker must not die if work has failed.
source share