I have this function in a driver program that collects the result from rdds into an array and sends it back. However, despite the fact that RDD (in dstream) has data, the function returns an empty array ... What am I doing wrong?
def runTopFunction() : Array[(String, Int)] = { val topSearches = some function.... val summary = new ArrayBuffer[(String,Int)]() topSearches.foreachRDD(rdd => { summary = summary.++(rdd.collect()) }) return summary.toArray }
So, while foreachRDDdoing what you want to do, it is also non-blocking, which means that it will not wait until the entire thread has been processed. Since you are cal toArrayin your buffer right after the call foreachRDD, until any elements are processed.
foreachRDD
toArray
DStream.forEachRDD - DStream . , .
DStream.forEachRDD
DStream
, , Dstream.forEachRDD " ", , .
, summary , :
summary
topSearches