Sparks Scala Understanding reduceByKey (_ + _)

I cannot understand reduceByKey (_ + _) in the first spark example with scala

object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _)  **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}
+4
source share
2 answers

The reduction takes two elements and creates a third after applying the function to two parameters.

The code you show is equivalent to the following

 reduceByKey((x,y)=> x + y)

Instead of defining dummy variables and writing lambda, Scala is smart enough to realize that you are trying to achieve an application func(the sum in this case) for any two parameters it receives, and therefore the syntax

 reduceByKey(_ + _) 
+9
source

reduceByKey takes two parameters, applies a function, and returns

reduceByKey (_ + _) ByKey ((x, y) = > x + y)

:

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

println("The sum of the numbers one through five is " + sum)

:

The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15

ByKey (_ ++ _) ByKey ((x, y) = > x ++ y)

0

All Articles