Why would anyone run Spark / Flink on Tez?

Tez paper by Saha et al. Shows the following Hadoop 2 modular architecture with Tez:

Hadoop 2 with tez

Why would anyone run Spark / Flink on Tez?

What are the benefits? Better to use YARN?

+7
hadoop tez apache-spark apache-flink apache-tez
source share
1 answer

If I understand correctly, a spark on a thesis could theoretically lead to a better, better DAG. For example, this can be applied to machine learning iterations.

The corresponding item is presented below.

We were able to encode the Spark DAG postcompilation in Tez DAG and successfully run it in the YARN cluster, which was not running the Spark engine service. The user-defined spark code is serialized into the Tez processor payload and injected into the general Spark, which deserializes and executes the user code. This allows unmodified Spark programs to run on YARN using their own Sparks time operators ... Tez sessions also allow you to run iterations of Spark machines to work efficiently by sending a DAG for each iteration to the overall Tez. This work is an experimental prototype, not part of the Spark project.

Thus, it seems that this combination has never been implemented outside the experimental setup, so even if there are decent reasons for combining Tez with tools like Spark, this will not help any projects at the moment.

In addition, my personal expectation is that if you do not have very specific workloads, I would be surprised if the Tez DAG significantly outperformed the normal DAG Spark.

0
source share

All Articles