ERROR YarnClientSchedulerBackend: asked to remove a nonexistent artist 21

When I first start

lines = sc.textFile(os.path.join(folder_name),100)

and then

parsed_lines=lines.map(lambda line: parse_line(line, ["udid"])).persist(StorageLevel.MEMORY_AND_DISK).groupByKey(1000).take(10)

I get the following error:

...
ERROR YarnClientSchedulerBackend: Asked to remove non-existent executor 21
...
WARN TaskSetManager: Lost task 0.1 in stage 11.7 (TID 1151, <machine name>): FetchFailed(null, shuffleId=0, mapId=-1, reduceId=896, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0

I tried to change the following parameters, as well as the number of sections in groupbykey and the number of sections in the textFile function.

conf.set("spark.cores.max", "128")
conf.set("spark.akka.frameSize", "1024")
conf.set("spark.executor.memory", "6G")
conf.set("spark.shuffle.file.buffer.kb", "100")

I am not sure how to determine these parameters based on the capabilities of the workers, the size of the input, and the transformations that I will apply.

+4
source share
1 answer

I got the same error. I solved the problem by reducing the number of artists I requested in my spark-defaults.conf.

Suppose it was originally:

spark.executor.instances 7

I switched it to read:

spark.executor.instances 4

I did not change anything and was able to avoid the error.

0
source

All Articles