SparkException: The wizard uninstalled our application

I know that there are other very similar questions about Stackoverflow, but to those who did not receive an answer or did not help me. Unlike these questions, I put a lot more information about the stack trace and the log file in this question. I hope this helps, although I raised the question so that it becomes long and ugly. Sorry.

Customization

I am running a 9 node cluster on Amazon EC2 using instances m3.xlargewith DSE (DataStax Enterprise) version 4.6 installed. For each workload (Cassandra, Search and Analytics), 3 nodes are used. DSE 4.6 combines Spark 1.1 and Cassandra 2.0.

Question

The application (Spark / Shark-Shell) is deleted after ~ 3 minutes, even if I do not run any request. Queries on small datasets are successful until they end in ~ 3 minutes.

I would like to analyze much larger data sets. Therefore, I need the application (shell) not to be deleted after ~ 3 minutes.

Error description

In a Spark or Shark shell, after a downtime of ~ 3 minutes or when executing (long) requests, Spark will eventually stop working and give the following stack trace:

15/08/25 14:58:09 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask

This is not very useful (to me), so I'm going to show you more information about the log file.

Error Details / Log Files

Master

Of master.logI think the parts concerned

INFO 2015-08-25 09:19:59 org.apache.spark.deploy.master.DseSparkMaster: akka.tcp://sparkWorker@172.31.46.48:46715 got disassociated, removing it.
INFO 2015-08-25 09:19:59 org.apache.spark.deploy.master.DseSparkMaster: akka.tcp://sparkWorker@172.31.33.35:42136 got disassociated, removing it.

and

ERROR 2015-08-25 09:21:01 org.apache.spark.deploy.master.DseSparkMaster: Application Shark::ip-172-31-46-49 with ID app-20150825091745-0007 failed 10 times, removing it
INFO 2015-08-25 09:21:01 org.apache.spark.deploy.master.DseSparkMaster: Removing app app-20150825091745-0007

Why do work nodes get disassociated?

, - (ID 1) stdout. stderr . , , .

Spark Master , ALIVE. .


Sparks of the master user interface


Spark Master UI Tool Information


, , . - ? , "() 10 ".

, Spark. , . , . node , ( ) stdout stderr . .

I

II

, , / . .

, , . , , :

, , - , , . , , .

.

Edit: Solr. .

(2). , , Google Analytics Google Analytics. . , , Analytics. , , ( nodetool rebuild -- Cassandra , Cassandra).

+4
1

, - node .

, : , , dns, /etc/hosts.

, AWS DNS. , node, node . , , /etc/hosts. dse "" node, , .

0

All Articles