Spark Async interface for Fold, Reduce, Aggregate?

In the official Spark RDD API:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/rdd/AsyncRDDActions.html

counting, collecting, foreach and all variants have asynchronous variants that return the future.

Why don't bends, contractions and aggregates have this asynchronous / future interface? This seems very important.

+8
asynchronous future apache-spark
source share
1 answer

TL; DR: The difference between spark "actions" and "transformations": https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#rdd-operations


Please note that all that you specified using the asynchronous option is β€œactions” , which means that they will immediately start processing the data and try to return synchronously. This can take a lot of time if there is a lot of data, so it's nice to have an asynchronous option.

At the same time, the operations that you specified without the asynchronous option are spark "transformations that are evaluated lazily, which means that it instantly creates a plan for the job, but will not actually process any data until you apply the" action "later to return the results.

Meanwhile, do you have special code or a problem that you are trying to solve with this?

0
source share

All Articles