spark.sql.Column provides a like method, but now (Spark 1.6.0 / 2.0.0) it only works with string literals. However, you can use raw SQL:
import org.apache.spark.sql.hive.HiveContext val sqlContext = new HiveContext(sc) // Make sure you use HiveContext import sqlContext.implicits._ // Optional, just to be able to use toDF val df = Seq(("foo", "bar"), ("foobar", "foo"), ("foobar", "bar")).toDF("a", "b") df.registerTempTable("df") sqlContext.sql("SELECT * FROM df WHERE a LIKE CONCAT('%', b, '%')") // +------+---+ // | a| b| // +------+---+ // |foobar|foo| // |foobar|bar| // +------+---+
or expr / selectExpr :
df.selectExpr("a like CONCAT('%', b, '%')")
Spark 1.5 will require a HiveContext . If for some reason the Hive context is not an option, you can use custom udf :
import org.apache.spark.sql.functions.udf val simple_like = udf((s: String, p: String) => s.contains(p)) df.where(simple_like($"a", $"b")) val regex_like = udf((s: String, p: String) => new scala.util.matching.Regex(p).findFirstIn(s).nonEmpty) df.where(regex_like($"a", $"b"))
zero323
source share