Hidden user cache seems to be growing rapidly in terms of space usage. I had an HDP 2.3.4 setup that was configured to use local-local, local to individual slave nodes.
The slave drive partition that was configured for local use in instances quickly filled up. I moved the location of the local servers in HDFS to a non-DFS space partition. This helped significantly reduce my application, but it looks like it has moved the problem to a much later stage when my cluster is handling more than 100 million events. At this point, HDFS usage is approaching 90%, with most of the usage coming from a part of DFS other than DFS (so what if there is no replication?). This causes all node yarn managers to stop and work to end.
Questions:
- Is there a way to increase the speed at which a user cache can quickly expire?
- Is adding custom cache to HDFS a good idea?
- The use of cache space seems larger than the data that is used for analysis. Could there be any other reasons that can grow quickly?
source share