Is a flat spark map a cause of shuffling?

Does flatMap contain in spark mode, as a map function, and therefore does not cause shuffling or causes shuffling. I suspect this is causing a shuffle. Can anyone confirm this?

+4
source share
2 answers

No shuffling with a card or flat card. The operations causing the shuffle are as follows:

  • Redistribution Operations:
    • Redistribution:
    • Coalesce:
  • ByKey operations (except counting):
    • GroupByKey:
    • ReduceByKey:
  • Association operations:
    • Cogroup:
    • Registration:

, , . - , :

  • mapPartitions , ,.sorted
  • repartitionAndSortWithinPartitions
  • sortBy, RDD

: http://spark.apache.org/docs/latest/programming-guide.html#shuffle-operations

+2

. :

/**
 * Return a new RDD by applying a function to all elements of this RDD.
 */
def map[U: ClassTag](f: T => U): RDD[U] = withScope {
  val cleanF = sc.clean(f)
  new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
}

/**
 *  Return a new RDD by first applying a function to all elements of this
 *  RDD, and then flattening the results.
 */
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = withScope {
  val cleanF = sc.clean(f)
  new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.flatMap(cleanF))
}

, RDD.flatMap flatMap Scala , .

+3

All Articles