Question: Regarding your question regarding when a spuffle is triggered on Spark?
Answer. Any operation join, cogroupor ByKeyincludes storing objects in hashmaps or buffers in memory for grouping or sorting. join, cogroupand groupByKeyuse these data structures in tasks for the steps that are on the side of the sample of tattoos that they run. reduceByKeyand aggregateByKeyuse data structures in tasks for the steps on both sides of the shuffle that they run.
Explanation: How does shuffle work in Spark?
Shuffle - Spark Hadoop. , , Hadoop, Spark.
Spark ( os), Spark. , Spark . Spark , (M) (R) , Hadoop. , M*R .
Hadoop, Spark spark.shuffle.compress . Snappy ( ) LZF. Snappy 33 .
, Spark , , Hadoop, . , , , , , , groupByKey reduceByKey. Spark , .
, Spark , Hadoop, , . , - Spark, Hadoop. . spark.reducer.maxMbInFlight ( 48 ).
Apache Spark, :