What is the exact difference between Spark Transform in DStream and the map?

Question

What is the exact difference between Spark Transform in DStream and the map?

I am trying to understand the conversion of Spark DStream to Spark Streaming.

I knew that the conversion is significantly superior to the map, but can someone give me a real-time example or a clear example that can distinguish between the transformation and the map?

+11

apache-spark spark-streaming

Srini Aug 23 '15 at 14:57

source share

6 answers

map is an elementary transform and transform is an RDD transform

Map

map (func): returns a new DStream, passing each element of the DStream source through the func function.

Here is an example that demonstrates the operation of displaying and converting to a DStream

 val conf = new SparkConf().setMaster("local[*]").setAppName("StreamingTransformExample") val ssc = new StreamingContext(conf, Seconds(5)) val rdd1 = ssc.sparkContext.parallelize(Array(1,2,3)) val rdd2 = ssc.sparkContext.parallelize(Array(4,5,6)) val rddQueue = new Queue[RDD[Int]] rddQueue.enqueue(rdd1) rddQueue.enqueue(rdd2) val numsDStream = ssc.queueStream(rddQueue, true) val plusOneDStream = numsDStream.map(x => x+1) plusOneDStream.print()

The map operation adds 1 to each element in all RDDs in the DStream, giving the result as shown below

 ------------------------------------------- Time: 1501135220000 ms ------------------------------------------- 2 3 4 ------------------------------------------- Time: 1501135225000 ms ------------------------------------------- 5 6 7 -------------------------------------------

transformations

transform (func): returns a new DStream using the RDD-to-RDD function for each RDD source of the DStream. This can be used for arbitrary RDD in DStream.

 val commonRdd = ssc.sparkContext.parallelize(Array(0)) val combinedDStream = numsDStream.transform(rdd=>(rdd.union(commonRdd))) combinedDStream.print()

transform allows you to perform RDD operations such as merge, merge, etc. on RDD in DStream, the code example given here will produce the result below

 ------------------------------------------- Time: 1501135490000 ms ------------------------------------------- 1 2 3 0 ------------------------------------------- Time: 1501135495000 ms ------------------------------------------- 4 5 6 0 ------------------------------------------- Time: 1501135500000 ms ------------------------------------------- 0 ------------------------------------------- Time: 1501135505000 ms ------------------------------------------- 0 -------------------------------------------

here commonRdd , which contains element 0 , the union operation is performed with all the underlying RDDs in the DStream.

+4

Remis haroon Jul 27 '17 at 6:10

source share

DStream has several RDDs since each packet spacing is a different RDD.
Thus, using transform (), you get the opportunity to apply the RDD operation to the entire DStream.

Example from Spark Docs: http://spark.apache.org/docs/latest/streaming-programming-guide.html#transform-operation

+1

uris 01 Oct '15 at 5:43

source share

The conversion function in Spark Streaming allows you to perform any conversion in the underlying RDD in the stream. For example, you can join two RDDs while streaming using Transform, in which one RDD will be some RDD created from a text file or a parallel collection, and the other RDD comes from Stream text file / socket, etc.

The map works with each RDD element in a particular batch and results in RDD after applying the function passed to the Map.

+1

rahul gulati Jan 19 '16 at 6:11

source share

example 1)

men in line come into the room, change their dress, and then marry the women of their choice.

1) Dress change is an operation on the map (where they themselves are converted into attributes)

2) A married woman is a merge / filter operation on you, but under the influence of others, which we can call a real transformation operation.

Example 2) A student enters college, few attend 2 lectures, few attend 4, and so on.

1) Attending lectures is working with a map, which is what students usually do.

2) but to determine what the lecturer taught them depends on the data of the RDD lecturer and his agenda.

Suppose a Transform operation is a Dimension or a static table that you want to filter or validate to identify the data you need, and deleting may be undesirable.

0

dalwinder singh Apr 7 '19 at 3:02

source share

If I have data from 0-1 with "Hello How" and for the next 1-2 with "Are You". Then, in the case of a map and reduction using a key example, as shown above, the output will be (hello, 1) and (How, 1) for the first package and (are, 1) and (you, 1) for the next package. But similarly for the next example using the "conversion function", what will be the difference in output

0

Anu Aug 25 '19 at 3:40

source share

Holden · Accepted Answer · 2015-08-23T23:54:50+0000

The transform function in the Spark stream allows you to use any of the Apache Spark transforms in the underlying RDDs for the stream. map used to transform an element into an element and can be implemented using transform . Essentially, map works with DStream and transform elements and allows working with RDDs RDDs. You can find http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams to be useful.

What is the exact difference between Spark Transform in DStream and the map?

Map

transformations

More articles: