Bug fixed: the initial task did not accept any resources; check your cluster user interface to make sure workers are registered and have sufficient resources

Question

Bug fixed: the initial task did not accept any resources; check your cluster user interface to make sure workers are registered and have sufficient resources

I have a virtual machine in which spark-2.0.0-bin-hadoop2.7 is installed offline.

I ran ./sbin/start-all.sh to start the master and subordinate.

When I do ./bin/spark-shell --master spark://192.168.43.27:7077 --driver-memory 600m --executor-memory 600m --executor-cores 1 in the computer itself, the task state is RUNNING , and I I can calculate the code in the spark shell.

When I execute the exact same command from another computer on the network, the status "WORKS" again, but the spark shell throws WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources . I think the problem is not directly related to resources, because the same team works on the virtual machine itself, but not when it comes from other machines.

I checked most of the topics related to this error, and none of them solved my problem. I even turned off the firewall using sudo ufw disable to make sure, but did not succeed (based on this link ), which suggests:

Disable client firewall: this was the solution that worked for me. Since I was working on a prototype of the internal code, I turned off the firewall on the node client. For some reason, the work nodes could not talk to the client for me. For production purposes, you would like to open a certain number of required ports.

+6

scala apache-spark

Arsinux Oct 11 '16 at 10:36

source share

2 answers

Fontaine007 · Answer 1 · 2017-12-07T05:11:24+0000

There are two well-known reasons for this:

Your application requires more resources (cores, memory) than is allocated. This is necessary to increase the working cores and memory. Most of the other answers focus on this.
If less known, a firewall blocks communication between the master and workers. This can happen, especially if you are using a cloud service. According to Spark Security , in addition to the standard ports 8080, 8081, 7077, 4040, you also need to make sure that the master and worker can communicate through SPARK_WORKER_PORT , spark.driver.port and spark.blockManager.port ; the last three are used by sending tasks and are randomly assigned by the program (if they are left unsettled). You can try to open all ports for quick testing.

Gearon · Answer 2 · 2019-07-25T01:57:02+0000

Add an example of the first bullet @ Fountaine007.

I ran into the same problem, and that is because dedicated vcores are less than application wait.

For my specific scenario, I increased the value of yarn.nodemanager.resource.cpu-vcores $HADOOP_HOME/etc/hadoop/yarn-site.xml .

To solve memory problems, you may also need to modify yarn.nodemanager.resource.memory-mb .

Bug fixed: the initial task did not accept any resources; check your cluster user interface to make sure workers are registered and have sufficient resources

More articles: