I have a sparking that consumes kafka messages. And I want to process all messages coming last 10 minutes together. There seem to be two approaches to doing the work done:
val ssc = new StreamingContext(new SparkConf(), Minutes(10)) val dstream = ....
and
val ssc = new StreamingContext(new SparkConf(), Seconds(1)) val dstream = .... dstream.window(Minutes(10), Minutes(10))
and I just want to clarify if there are performance differences between them
, n m . , 10 , , 60 30 . 60 30 . , 6 - A, B, C, D, E, F, . 30 , D, E, F, G, H, I. , 3 .
, , - , Spark . RDD A F union RDD. 6- , , . , - . updateStateByKey.