Yes, you could use broadcast variables if your data is read only (immutable). The broadcast variable must satisfy the following properties.
- Set to memory
- Unchanging
- Applies to cluster
So, here is the only condition - your data should be able to fit in memory on one node. This means that the data should NOT be anything larger or larger than the memory limits, such as a massive table.
Each executor receives a copy of the broadcast variable, and all tasks in this particular executor read / use this data. This is similar to sending read-only big data to all the working nodes of the cluster. those. send to each employee only once, and not with each task, and the performers (tasks) read the data.
Kris
source share