Efficiency of FlatMap against map with subsequent decrease in Spark

I have a sherlock.txt text file containing several lines of text. I load it into a spark shell using:

val textFile = sc.textFile("sherlock.txt")

My goal is to count the number of words in a file. I came across two alternative ways to do this job.

First use flatMap:

textFile.flatMap(line => line.split(" ")).count()

The second use of the card, followed by a decrease:

textFile.map(line => line.split(" ").size).reduce((a, b) => a + b)

Both give the same result correctly. I want to know the difference in time and complexity space of the two above-mentioned alternative implementations, if there really are any?

Does scala translate both into the most efficient form?

+4
1

, map sum:

textFile.map(_.split(" ").size).sum

line.split(" ").

, , , , Array, , .

count :

def count(): Long = sc.runJob(this, Utils.getIteratorSize _).sum  

Utils.getIteratorSize Iterator sum

_.fold(0.0)(_ + _)
+5

All Articles