How to implement a similar condition in SparkSQL?

Question

How to implement a similar condition in SparkSQL?

How to write an SQL statement to achieve a goal, for example, the following statement:

SELECT * FROM table t WHERE ta LIKE '%'||tb||'%';

Thanks.

+7

sql apache-spark apache-spark-sql

liamlee Sep 11 '15 at 12:56

source share

1 answer

zero323 · Answer 1 · 2015-09-23T16:25:06+0000

spark.sql.Column provides a like method, but now (Spark 1.6.0 / 2.0.0) it only works with string literals. However, you can use raw SQL:

 import org.apache.spark.sql.hive.HiveContext val sqlContext = new HiveContext(sc) // Make sure you use HiveContext import sqlContext.implicits._ // Optional, just to be able to use toDF val df = Seq(("foo", "bar"), ("foobar", "foo"), ("foobar", "bar")).toDF("a", "b") df.registerTempTable("df") sqlContext.sql("SELECT * FROM df WHERE a LIKE CONCAT('%', b, '%')") // +------+---+ // | a| b| // +------+---+ // |foobar|foo| // |foobar|bar| // +------+---+

or expr / selectExpr :

 df.selectExpr("a like CONCAT('%', b, '%')")

Spark 1.5 will require a HiveContext . If for some reason the Hive context is not an option, you can use custom udf :

 import org.apache.spark.sql.functions.udf val simple_like = udf((s: String, p: String) => s.contains(p)) df.where(simple_like($"a", $"b")) val regex_like = udf((s: String, p: String) => new scala.util.matching.Regex(p).findFirstIn(s).nonEmpty) df.where(regex_like($"a", $"b"))

How to implement a similar condition in SparkSQL?

More articles: