Apache Spark: "Could not start org.apache.spark.deploy.worker.Worker" or "Wizard"

Question

Apache Spark: "Could not start org.apache.spark.deploy.worker.Worker" or "Wizard"

I created a Spark cluster in Openstack, running on Ubuntu14.04, with 8 GB of RAM. I created two virtual machines with 3gb each (saving 2 gb for the parent OS). In addition, I create wizards and 2 workers from the first virtual machine and 3 workers from the second machine.

The spark-env.sh file has a basic setup with

export SPARK_MASTER_IP=10.0.0.30 export SPARK_WORKER_INSTANCES=2 export SPARK_WORKER_MEMORY=1g export SPARK_WORKER_CORES=1

Whenever I deploy a cluster with start -all.sh, I get "failed to start org.apache.spark.deploy.worker.Worker" and several times "failed to start org.apache.spark.deploy.master. Wizard " When I see a log file to look for an error, I get the following

Spark command: / usr / lib / jvm / java-7-openjdk-amd64 / bin / java -cp> /home/ubuntu/spark-1.5.1/sbin/../conf/:/home/ubuntu/spark- > 1.5.1 / assembly / target / scala-2.10 / spark-assembly-1.5.1-> hadoop2.2.0.jar: /home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus- API-> SDO- 3.2.6.jar: /home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-core-> 3.2.10.jar: /home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-rdbms -> 3.2.9.jar -Xms1g -Xmx1g -XX: MaxPermSize = 256m> org.apache.spark.deploy.master.Master --ip 10.0.0.30 - port 7077 --webui-> port 8080

Although I get an error message, the owner or employee comes to life in a few seconds.

Can someone explain the reason?

+6

ubuntu cluster-computing apache-spark

jsingh13 Feb 02 '16 at 3:32

source share

1 answer

chbrown · Accepted Answer · 2016-06-20T23:44:48+0000

Spark's configuration system is a mess of environment variables, argument flags, and Java property files. I just spent a couple of hours tracking the same warning and solving the Spark initialization procedure, and here is what I found:

sbin/start-all.sh calls sbin/start-master.sh (and then sbin/start-slaves.sh )
sbin/start-master.sh calls sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
sbin/spark-daemon.sh start ... cancels the call to bin/spark-class org.apache.spark.deploy.master.Master ... , captures the resulting process identifier (pid), which sleeps for 2 seconds, and then checks is this the name of the pid command "java"
bin/spark-class is a bash script, so it starts with the name of the command "bash" and goes to:
- (re) load the Spark environment by looking for bin/load-spark-env.sh
- finds java executable
- finds the right can Spark
- calls java ... org.apache.spark.launcher.Main ... to get the full class path needed to deploy Spark.
- then finally passes control through exec , java ... org.apache.spark.deploy.master.Master , after which the command name becomes "java"

If steps 4.1 to 4.5 take more than 2 seconds, which in my (and your) experience seems almost inevitable on a new OS, where java has never been run before, you will get a “failed to start” message even though nothing happened.

The slaves will complain for the same reason and shake until the master is available, but they must continue to try again until they can successfully connect to the master.

I have a fairly standard Spark deployment running on EC2; I use:

conf/spark-defaults.conf install spark.executor.memory and add some custom jars using spark.{driver,executor}.extraClassPath
conf/spark-env.sh set SPARK_WORKER_CORES=$(($(nproc) * 2))
conf/slaves to list my subordinates

Here's how I start Spark deployment, bypassing the {bin,sbin}/*.sh minefield / maze:

 # on master, with SPARK_HOME and conf/slaves set appropriately mapfile -t ARGS < <(java -cp $SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.master.Master | tr '\0' '\n') # $ARGS now contains the full call to start the master, which I daemonize with nohup SPARK_PUBLIC_DNS=0.0.0.0 nohup "${ARGS[@]}" >> $SPARK_HOME/master.log 2>&1 < /dev/null &

I still use sbin/start-daemon.sh to start slaves, as this is easier than calling nohup in the ssh command:

 MASTER=spark://$(hostname -i):7077 while read -r; do ssh -o StrictHostKeyChecking=no $REPLY "$SPARK_HOME/sbin/spark-daemon.sh start org.apache.spark.deploy.worker.Worker 1 $MASTER" & done <$SPARK_HOME/conf/slaves # this forks the ssh calls, so wait for them to exit before you logout

There! It assumes that I use all the default ports and stuff, and that I don't do stupid crap like putting spaces in file names, but I think it is cleaner.

Apache Spark: "Could not start org.apache.spark.deploy.worker.Worker" or "Wizard"

More articles: