Bug fixed: the initial task did not accept any resources; check your cluster user interface to make sure workers are registered and have sufficient resources

I have a virtual machine in which spark-2.0.0-bin-hadoop2.7 is installed offline.

I ran ./sbin/start-all.sh to start the master and subordinate.

When I do ./bin/spark-shell --master spark://192.168.43.27:7077 --driver-memory 600m --executor-memory 600m --executor-cores 1 in the computer itself, the task state is RUNNING , and I I can calculate the code in the spark shell.

running spark shell in a virtual machine

When I execute the exact same command from another computer on the network, the status "WORKS" again, but the spark shell throws WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources . I think the problem is not directly related to resources, because the same team works on the virtual machine itself, but not when it comes from other machines.

launch a spark shell through another machine on the network

I checked most of the topics related to this error, and none of them solved my problem. I even turned off the firewall using sudo ufw disable to make sure, but did not succeed (based on this link ), which suggests:

Disable client firewall: this was the solution that worked for me. Since I was working on a prototype of the internal code, I turned off the firewall on the node client. For some reason, the work nodes could not talk to the client for me. For production purposes, you would like to open a certain number of required ports.

+6
source share
2 answers

There are two well-known reasons for this:

  • Your application requires more resources (cores, memory) than is allocated. This is necessary to increase the working cores and memory. Most of the other answers focus on this.

  • If less known, a firewall blocks communication between the master and workers. This can happen, especially if you are using a cloud service. According to Spark Security , in addition to the standard ports 8080, 8081, 7077, 4040, you also need to make sure that the master and worker can communicate through SPARK_WORKER_PORT , spark.driver.port and spark.blockManager.port ; the last three are used by sending tasks and are randomly assigned by the program (if they are left unsettled). You can try to open all ports for quick testing.

+4
source

Add an example of the first bullet @ Fountaine007.

I ran into the same problem, and that is because dedicated vcores are less than application wait.

For my specific scenario, I increased the value of yarn.nodemanager.resource.cpu-vcores $HADOOP_HOME/etc/hadoop/yarn-site.xml .

To solve memory problems, you may also need to modify yarn.nodemanager.resource.memory-mb .

0
source

All Articles