cache() same as persist(StorageLevel.MEMORY_ONLY) , and your data volume probably exceeds the available memory. Spark then evicts the caches in the "least recently used" way.
You can configure backup memory for caching by specifying configuration parameters. For more information see the Spark Documentation and note: spark.driver.memory , spark.executor.memory , spark.storage.memoryFraction
Not an expert, but I don't think textFile() automatically caches anything; Spark Quick Start explicitly caches the RDD text file: sc.textFile(logFile, 2).cache()
source share