Sparks Scala Understanding reduceByKey (_ + _)

Question

Sparks Scala Understanding reduceByKey (_ + _)

I cannot understand reduceByKey (_ + _) in the first spark example with scala

object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _)  **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}

+4

scala bigdata apache-spark word-count

Elsayed May 01 '16 at 9:57

source share

2 answers

reduceByKey takes two parameters, applies a function, and returns

reduceByKey (_ + _) ByKey ((x, y) = > x + y)

:

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

println("The sum of the numbers one through five is " + sum)

:

The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15

ByKey (_ ++ _) ByKey ((x, y) = > x ++ y)

0

vaquar khan 16 . '17 21:46

Sleiman jneidi · Accepted Answer · 2016-05-01T10:05:50+0000

The reduction takes two elements and creates a third after applying the function to two parameters.

The code you show is equivalent to the following

 reduceByKey((x,y)=> x + y)

Instead of defining dummy variables and writing lambda, Scala is smart enough to realize that you are trying to achieve an application func(the sum in this case) for any two parameters it receives, and therefore the syntax

 reduceByKey(_ + _)

Sparks Scala Understanding reduceByKey (_ + _)

More articles: