This is a sparking program written in scala. It counts the number of words from the socket every 1 second. The result will be the number of words, for example, the number of words from time 0 to 1, and the number of words from time 1 to 2. But I wonder if it is possible to somehow change this program so that we can accumulate the number of words? That is, the word is counted from time 0 until now.
val sparkConf = new SparkConf().setAppName("NetworkWordCount") val ssc = new StreamingContext(sparkConf, Seconds(1)) // Create a socket stream on target ip:port and count the // words in input stream of \n delimited text (eg. generated by 'nc') // Note that no duplication in storage level only for running locally. // Replication necessary in distributed scenario for fault tolerance. val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination()
scala distributed apache-spark spark-streaming
user2895478
source share