How to convert from org.apache.spark.mllib.linalg.VectorUDT to ml.linalg.VectorUDT

Question

How to convert from org.apache.spark.mllib.linalg.VectorUDT to ml.linalg.VectorUDT

I am using Spark cluster 2.0, and I would like to convert the vector from org.apache.spark.mllib.linalg.VectorUDT to org.apache.spark.ml.linalg.VectorUDT .

 # Import LinearRegression class from pyspark.ml.regression import LinearRegression # Define LinearRegression algorithm lr = LinearRegression() modelA = lr.fit(data, {lr.regParam:0.0})

Mistake:

U'requirement error: column functions must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 , but actually were org.apache.spark.mllib.linalg.VectorUDT@f71b0bce. '

Any thoughts on how I do this conversion between types of vectors.

Many thanks.

+5

machine-learning apache-spark pyspark apache-spark-mllib apache-spark-ml

Mostafa Dec 13 '16 at 17:22

source share

1 answer

user6910411 · Answer 1 · 2016-12-13T17:31:57+0000

In PySpark, you will need or map over RDD. Let me use the first option. First import pair:

 from pyspark.ml.linalg import VectorUDT from pyspark.sql.functions import udf

and function:

 as_ml = udf(lambda v: v.asML() if v is not None else None, VectorUDT())

With sample data:

 from pyspark.mllib.linalg import Vectors as MLLibVectors df = sc.parallelize([ (MLLibVectors.sparse(4, [0, 2], [1, -1]), ), (MLLibVectors.dense([1, 2, 3, 4]), ) ]).toDF(["features"]) result = df.withColumn("features", as_ml("features"))

Result

 +--------------------+ | features| +--------------------+ |(4,[0,2],[1.0,-1.0])| | [1.0,2.0,3.0,4.0]| +--------------------+

How to convert from org.apache.spark.mllib.linalg.VectorUDT to ml.linalg.VectorUDT

More articles: