Spark on Yarn: how to quickly avoid a sharp increase in user cache size

Hidden user cache seems to be growing rapidly in terms of space usage. I had an HDP 2.3.4 setup that was configured to use local-local, local to individual slave nodes.

The slave drive partition that was configured for local use in instances quickly filled up. I moved the location of the local servers in HDFS to a non-DFS space partition. This helped significantly reduce my application, but it looks like it has moved the problem to a much later stage when my cluster is handling more than 100 million events. At this point, HDFS usage is approaching 90%, with most of the usage coming from a part of DFS other than DFS (so what if there is no replication?). This causes all node yarn managers to stop and work to end.

Questions:

  • Is there a way to increase the speed at which a user cache can quickly expire?
  • Is adding custom cache to HDFS a good idea?
  • The use of cache space seems larger than the data that is used for analysis. Could there be any other reasons that can grow quickly?
+6
source share

All Articles