I have a DataFrame in Spark in which one of the columns contains an array. Now I have written a separate UDF that converts the array to another array with separate values ββin it. See the example below:
Ex: [24,23,27,23] must be converted to [24, 23, 27] Code:
def uniq_array(col_array): x = np.unique(col_array) return x uniq_array_udf = udf(uniq_array,ArrayType(IntegerType())) Df3 = Df2.withColumn("age_array_unique",uniq_array_udf(Df2.age_array))
In the above code, Df2.age_array is the array on which I use UDF to get another column "age_array_unique" , which should contain only unique values ββin the array.
However, as soon as I ran the Df3.show() command, I get an error message:
net.razorvine.pickle.PickleException: expected null arguments to build a ClassDict (for numpy.core.multiarray._reconstruct)
Can anyone tell me why this is happening?
Thanks!
arrays user-defined-functions apache-spark pyspark apache-spark-sql
Preyas
source share