HashMap as a broadcast variable in Spark Streaming?

I have some data that needs to be classified in a spark stream. The classification key values ​​are loaded at the beginning of the program in the HashMap. Therefore, each incoming data packet must be mapped to these keys and labeled accordingly.

I understand that a spark has variables called broadcast variables and storage devices for distributing objects. The examples in the tutorials use simple type variables, etc.

How can I share my HashMap with all sparks using HashMap. Alternatively, is there a better way to do this?

I am coding my sparking in Java.

+4
source share
1 answer

. , , .

Scala:

val br = ssc.sparkContext.broadcast(Map(1 -> 2))

Java:

Broadcast<HashMap<String, String>> br = ssc.sparkContext().broadcast(new HashMap<>());
+4

All Articles