Spark Async interface for Fold, Reduce, Aggregate?

Question

Spark Async interface for Fold, Reduce, Aggregate?

In the official Spark RDD API:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/rdd/AsyncRDDActions.html

counting, collecting, foreach and all variants have asynchronous variants that return the future.

Why don't bends, contractions and aggregates have this asynchronous / future interface? This seems very important.

+8

asynchronous future apache-spark

clay Mar 31 '15 at 15:45

source share

1 answer

combinatorist · Answer 1 · 2017-12-27T06:11:29+0000

TL; DR: The difference between spark "actions" and "transformations": https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#rdd-operations

Please note that all that you specified using the asynchronous option is “actions” , which means that they will immediately start processing the data and try to return synchronously. This can take a lot of time if there is a lot of data, so it's nice to have an asynchronous option.

At the same time, the operations that you specified without the asynchronous option are spark "transformations that are evaluated lazily, which means that it instantly creates a plan for the job, but will not actually process any data until you apply the" action "later to return the results.

Meanwhile, do you have special code or a problem that you are trying to solve with this?

Spark Async interface for Fold, Reduce, Aggregate?

More articles: