You can create a simple function for this. First import pair:
import org.apache.spark.sql.functions.{trim, length, when} import org.apache.spark.sql.Column
and definition:
def emptyToNull(c: Column) = when(length(trim(c)) > 0, c)
Finally, a quick test:
val df = Seq(" ", "foo", "", "bar").toDF df.withColumn("value", emptyToNull($"value"))
which should give the following result:
+-----+ |value| +-----+ | null| | foo| | null| | bar| +-----+
If you want to replace the empty string with the string "NULL , you can add the otherwise clause:
def emptyToNullString(c: Column) = when(length(trim(c)) > 0, c).otherwise("NULL")
user6910411
source share