What spark operations are processed in parallel?

Question

What spark operations are processed in parallel?

I try to bow my head to the whole concept of spark. I think I have a very rudimentary understanding of the spark platform. From what I understand, Spark has an RDD concept, which is a collection of “material” in memory, so processing is faster. You convert RDD using methods like map and flatmaps. Because conversions are lazy , they are not processed until you invoke an action on the final RDD. What I don’t understand about, when you perform an action, do parallel conversions occur? Can you appoint workers in parallel action?

For example, let's say I have a text file that I upload to RDD,

lines = //loadRDD
lines.map(SomeFunction())
lines.count()

What is really going on? Does SomeFunction () handle the RDD section? What is the parallel aspect?

+4

apache-spark rdd spark streaming

Instinct Jun 29 '15 at 2:34

source share

2 answers

An RDD - , . , node .

, , - . .

map . , A1, A2 A3, Spark , N1, N2 N3, . map(someFunction()) N1 someFunction A1, .

count, "N1, , ", node. , collect . , , , RDD node ( ..).

, , , , , , . Spark ( ), , , .

, Spark, . , , , .

+1

Chet 29 . '15 3:18

Alister Lee · Accepted Answer · 2015-06-29T03:20:48+0000

linesis simply the name of the RDD data structure that is in the driver, which is a partitioned list of strings. partitionsmanaged on each of your work nodes when they are needed.

count, Spark , (a partition), SomeFunction , . , , SomeFunction /.

, , .

. SomeFunction .

What spark operations are processed in parallel?

More articles: