In my understanding, you are not inclined, RDD is not data, but a way to create data using transformations / filters from the source data.
Another idea is to separate the final data. Thus, you will store RDD in a data warehouse, for example: - HDFS (parquet file, etc.) - Elicsearch - Apache Ignite (in-memory)
I think you will like Apache Ignite: https://ignite.apache.org/features/igniterdd.html
Apache Ignite provides an implementation of the Spark RDD abstraction which allows you to easily exchange data in memory through several Spark workstations, either in one application or between different Spark applications.
IgniteRDD is implemented as a representation of the distributed Ignite cache, which can be deployed either during the execution of the Spark job, or at the Spark workstation or in its own cluster.
(I let you dig your documentation to find what you are looking for.)
Thomas decaux
source share