Collect results from RDD in the dstream driver program

I have this function in a driver program that collects the result from rdds into an array and sends it back. However, despite the fact that RDD (in dstream) has data, the function returns an empty array ... What am I doing wrong?

def runTopFunction() : Array[(String, Int)] = {
        val topSearches = some function....
        val summary = new ArrayBuffer[(String,Int)]()
        topSearches.foreachRDD(rdd => {
            summary = summary.++(rdd.collect())
        })    

    return summary.toArray
}
+4
source share
2 answers

So, while foreachRDDdoing what you want to do, it is also non-blocking, which means that it will not wait until the entire thread has been processed. Since you are cal toArrayin your buffer right after the call foreachRDD, until any elements are processed.

+1
source

DStream.forEachRDD - DStream . , .

, , Dstream.forEachRDD " ", , .

, summary , :

  • , . -k.
  • (fs, db), topSearches dstream.
+1

All Articles