Luminous fluxes: enrichment of the reference data stream

I have a spark stream that is configured to read from a socket, does some data enrichment before publishing it to the rabbit queue. Enrichment looks at the information on the card that was created by reading a regular text file (Source.fromFile ...) before setting up the streaming context.

I have a feeling that this is actually not the case. On the other hand, when using StreamingContext, I can only read streams, not static files, as I could do with SparkContext.

I can try to resolve several contexts, but I'm not sure if this is the right way.

Any advice is appreciated.

+4
source share
2

, , , , , Spark - Broadcast. , , .

- , , "" broadcastVar.value .

DStream:

// could replace with Source.from File as well. This is just more practical
val data = sc.textFile("loopup.txt").map(toKeyValue).collectAsMap() 
// declare the broadcast variable
val bcastData = sc.broadcast(data)

... initialize streams ...

socketDStream.map{ elem => 
    // doing every step here explicitly for illustrative purposes. Usually, one would typically just chain these calls
    // get the map within the broadcast wrapper
    val lookupMap = bcastData.value
    // use the map to lookup some data
    val lookupValue = lookupMap.getOrElse(elem, "not found")
    // create the desired result
    (elem, lookupValue)
}
socketDStream.saveTo...
+6

, , Source.fromFile ( ).

SparkContext, streamingContext.sparkContext DStream transform foreachRDD.

+4

All Articles