How to extend Spark Catalyst optimizer with custom rules?

I want to use Catalyst rules to convert a star schema ( https://en.wikipedia.org/wiki/Star_schema ) SQL query to SQL query for a denormalized star schema, where some fields from dimension tables are presented in a fact table. I tried to find some extension points to add my own rules to make the conversion described above. But I did not find any extension points. So, there are the following questions:

  • How can I add my own rules for the catalyst optimizer?
  • Is there any other solution for implementing the function described above?
+4
source share
2 answers

As a hint, now Spark 2.0, you can import extraStrategiesand extraOptimizationsthrough SparkSession.experimental.

+2
source

Following @Ambling's guidelines, you can use sparkSession.experimental.extraStrategiesto add your functionality to SparkPlanner.

An example strategy that simply prints "Hello world" on the console

object MyStrategy extends Strategy {
  def apply(plan: LogicalPlan): Seq[SparkPlan] = {
    println("Hello world!")
    Nil
  }
}

with run example:

val spark = SparkSession.builder().master("local").getOrCreate()

spark.experimental.extraStrategies = Seq(MyStrategy)
val q = spark.catalog.listTables.filter(t => t.name == "five")
q.explain(true)
spark.stop()

You can find a working example project for a friend of GitHub: https://github.com/bartekkalinka/spark-custom-rule-executor

+3
source

All Articles