Spark on Yarn: how to quickly avoid a sharp increase in user cache size

Question

Spark on Yarn: how to quickly avoid a sharp increase in user cache size

Hidden user cache seems to be growing rapidly in terms of space usage. I had an HDP 2.3.4 setup that was configured to use local-local, local to individual slave nodes.

The slave drive partition that was configured for local use in instances quickly filled up. I moved the location of the local servers in HDFS to a non-DFS space partition. This helped significantly reduce my application, but it looks like it has moved the problem to a much later stage when my cluster is handling more than 100 million events. At this point, HDFS usage is approaching 90%, with most of the usage coming from a part of DFS other than DFS (so what if there is no replication?). This causes all node yarn managers to stop and work to end.

Questions:

Is there a way to increase the speed at which a user cache can quickly expire?
Is adding custom cache to HDFS a good idea?
The use of cache space seems larger than the data that is used for analysis. Could there be any other reasons that can grow quickly?

+6

hadoop yarn apache-spark

Keshi Apr 13 '16 at 7:09

source share

No one has answered this question yet.

See related questions:

4

Oozie / yarn: resource changed on src file system

3

Why does a disk busy jump occur between shutting down and shutting down Spark?

3

Why did Spark decide to do all the work on one node?