There is a Resilient Distributed Dataset (RDD) concept used by Spark, which allows you to transparently store data in memory and save it to disk if necessary.
On the other hand, in Map it decreases after Map and reduces the number of jobs, the data will be shuffled and sorted (synchronization barrier) and written to disk.
Spark has no synchronization barrier that slows down the decline of the map. And using memory makes the execution mechanism very fast.
source
share