Spark vs. MapReduce, why is Spark faster than MR, principle?

As I know, Spark preloads data from the disk of all nodes (HDFS) in the RDD of each node for calculation. But, I believe, MapReduce should also load data from HDFS into memory, and then compute it in memory. So .. why is Spark more unstable? Just because MapReduce loads data into memory every time MapReduce wants to perform a calculation, but does Spark preload the data? Thank you very much.

+4
source share
1 answer

There is a Resilient Distributed Dataset (RDD) concept used by Spark, which allows you to transparently store data in memory and save it to disk if necessary.

On the other hand, in Map it decreases after Map and reduces the number of jobs, the data will be shuffled and sorted (synchronization barrier) and written to disk.

Spark has no synchronization barrier that slows down the decline of the map. And using memory makes the execution mechanism very fast.

0
source

All Articles