In Spark, is it possible to share data between two artists?

I have really big read-only data that I want all artists to use the same node. It is possible in Spark. I know you can pass variables, but can you translate really large arrays. Under the hood, is it sharing data between artists on the same node? How can this exchange data between JVM executors working on the same node?

+7
java scala apache-spark
source share
1 answer

Yes, you could use broadcast variables if your data is read only (immutable). The broadcast variable must satisfy the following properties.

  • Set to memory
  • Unchanging
  • Applies to cluster

So, here is the only condition - your data should be able to fit in memory on one node. This means that the data should NOT be anything larger or larger than the memory limits, such as a massive table.

Each executor receives a copy of the broadcast variable, and all tasks in this particular executor read / use this data. This is similar to sending read-only big data to all the working nodes of the cluster. those. send to each employee only once, and not with each task, and the performers (tasks) read the data.

+5
source share

All Articles