Pyspark expected null arguments to build a ClassDict (for pyspark.mllib.linalg.DenseVector)

I get the error "expected null arguments to build ClassDict (for pyspark.mllib.linalg.DenseVector)" by trying this:

I have a function that I convert to udf to convert column values ​​from a data frame. Like this:

def func(vector):
   #does something

   return Vector.dense(vector)

udfunc = udf(func, ArrayType(FloatType()))

new_df = df.withColumn("vector",func(df.vector))
new_df.show()

The df.vector column has dense Vector values.

Does anyone have an idea to fix this idea or hint?

Thanks at Advance

+4
source share
1 answer

, , . Vector - VectorUDT ArrayType(FloatType())

from pyspark.mllib.linalg import Vectors, VectorUDT
from pyspark.sql.types import ArrayType, FloatType
from pyspark.sql.functions import udf

dummy_udf = udf(lambda _: Vectors.dense([0, 0, 0]), VectorUDT())

sc.parallelize([(Vectors.dense([1, 1, 1]), )]).toDF(["x"]).select(dummy_udf("x"))

Spark 2.0 pyspark.ml.linalg API pyspark.ml.

+3

All Articles