How can I call UDF on a Spark DataFrame using JAVA?

The same question as here , but I don’t have enough points to comment.

According to the latest Spark documentation , udf can be used in two different ways: one with SQL and the other with a DataFrame. I found some examples of how to use udf with SQL, but could not find a single one on how to use udf directly in a DataFrame.

The solution provided by __callUDF()__ on the question above uses __callUDF()__ which is _deprecated_ and will be removed in Spark 2.0 in accordance with the Spark Java API documentation. It is written there:

"since this is redundant with udf ()"

so that means that I should be able to use __udf()__ for __udf()__ my udf , but I cannot figure out how to do this. I have not come across anything that explains the syntax of Java-Spark programs. What am I missing?

 import org.apache.spark.sql.api.java.UDF1; . . UDF1 mode = new UDF1<String[], String>() { public String call(final String[] types) throws Exception { return types[0]; } }; sqlContext.udf().register("mode", mode, DataTypes.StringType); df.???????? how do I call my udf (mode) on a given column of my DataFrame df? 
+13
source share
1 answer

Spark> = 2.3

The udf Scala style can be invoked directly:

 import static org.apache.spark.sql.functions.*; import org.apache.spark.sql.expressions.UserDefinedFunction; UserDefinedFunction mode = udf( (Seq<String> ss) -> ss.headOption(), DataTypes.StringType ); df.select(mode.apply(col("vs"))).show(); 

Spark <2.3

Even if we assume that your UDF is useful and cannot be replaced with a simple call to getItem it has the wrong signature. Array columns are not exposed using Scala WrappedArray as simple Java arrays, so you need to configure the signature:

 UDF1 mode = new UDF1<Seq<String>, String>() { public String call(final Seq<String> types) throws Exception { return types.headOption(); } }; 

If the UDF is already registered:

 sqlContext.udf().register("mode", mode, DataTypes.StringType); 

You can simply use callUDF (this is a new function introduced in 1.5) to call it by name:

 df.select(callUDF("mode", col("vs"))).show(); 

You can also use it in selectExprs :

 df.selectExpr("mode(vs)").show(); 
+18
source

All Articles