map is an elementary transform and transform is an RDD transform
Map
map (func): returns a new DStream, passing each element of the DStream source through the func function.
Here is an example that demonstrates the operation of displaying and converting to a DStream
val conf = new SparkConf().setMaster("local[*]").setAppName("StreamingTransformExample") val ssc = new StreamingContext(conf, Seconds(5)) val rdd1 = ssc.sparkContext.parallelize(Array(1,2,3)) val rdd2 = ssc.sparkContext.parallelize(Array(4,5,6)) val rddQueue = new Queue[RDD[Int]] rddQueue.enqueue(rdd1) rddQueue.enqueue(rdd2) val numsDStream = ssc.queueStream(rddQueue, true) val plusOneDStream = numsDStream.map(x => x+1) plusOneDStream.print()
The map operation adds 1 to each element in all RDDs in the DStream, giving the result as shown below
------------------------------------------- Time: 1501135220000 ms ------------------------------------------- 2 3 4 ------------------------------------------- Time: 1501135225000 ms ------------------------------------------- 5 6 7 -------------------------------------------
transformations
transform (func): returns a new DStream using the RDD-to-RDD function for each RDD source of the DStream. This can be used for arbitrary RDD in DStream.
val commonRdd = ssc.sparkContext.parallelize(Array(0)) val combinedDStream = numsDStream.transform(rdd=>(rdd.union(commonRdd))) combinedDStream.print()
transform allows you to perform RDD operations such as merge, merge, etc. on RDD in DStream, the code example given here will produce the result below
------------------------------------------- Time: 1501135490000 ms ------------------------------------------- 1 2 3 0 ------------------------------------------- Time: 1501135495000 ms ------------------------------------------- 4 5 6 0 ------------------------------------------- Time: 1501135500000 ms ------------------------------------------- 0 ------------------------------------------- Time: 1501135505000 ms ------------------------------------------- 0 -------------------------------------------
here commonRdd , which contains element 0 , the union operation is performed with all the underlying RDDs in the DStream.
Remis haroon
source share