Spark> = 2.3
The udf Scala style can be invoked directly:
import static org.apache.spark.sql.functions.*; import org.apache.spark.sql.expressions.UserDefinedFunction; UserDefinedFunction mode = udf( (Seq<String> ss) -> ss.headOption(), DataTypes.StringType ); df.select(mode.apply(col("vs"))).show();
Spark <2.3
Even if we assume that your UDF is useful and cannot be replaced with a simple call to getItem it has the wrong signature. Array columns are not exposed using Scala WrappedArray as simple Java arrays, so you need to configure the signature:
UDF1 mode = new UDF1<Seq<String>, String>() { public String call(final Seq<String> types) throws Exception { return types.headOption(); } };
If the UDF is already registered:
sqlContext.udf().register("mode", mode, DataTypes.StringType);
You can simply use callUDF (this is a new function introduced in 1.5) to call it by name:
df.select(callUDF("mode", col("vs"))).show();
You can also use it in selectExprs :
df.selectExpr("mode(vs)").show();
source share