Spark loses all artists a minute after launch

Question

Spark loses all artists a minute after launch

I am running pysparkin an 8 node Google Dataproc cluster with default settings. A few seconds after starting, I see 30 executable kernels (as expected):

    >>> sc.defaultParallelism
    thirty

After a minute:

    >>> sc.defaultParallelism
    2

From now on, all actions are performed on only 2 cores:

    >>> rng = sc.parallelize (range (1,1000000))
    >>> rng.cache ()
    >>> rng.count ()
    >>> rng.getNumPartitions ()
    2

If I run rng.cache()while the kernels are still connected, they remain connected and jobs are distributed.

Checking the monitoring application (port 4040 on the master node) shows that the executors are deleted:

Executor 1
Removed at 2016/02/25 16:20:14
Reason: Container container_1456414665542_0006_01_000002 exited from explicit termination request."

- , ?

+4

apache-spark pyspark google-cloud-dataproc

Tomas Vitulskis 26 . '16 10:23

1

DoIT International · Accepted Answer · 2016-02-26T12:04:15+0000

, , , , Spark on YARN . YARN "VCores Used" , .

:

Spark YARN, , , , " ". , , , , , , , .

, , ( - , YARN ), 60 ( 60 , ).

, :

spark-shell --conf spark.dynamicAllocation.enabled=false

gcloud dataproc jobs submit spark --properties spark.dynamicAllocation.enabled=false --cluster <your-cluster> foo.jar

, , :

spark-shell --conf spark.executor.instances=123

gcloud dataproc jobs submit spark --properties spark.executor.instances=123 --cluster <your-cluster> foo.jar

Spark loses all artists a minute after launch

More articles: