How to convert ML VectorUDT functions from .mllib to .ml type

Using the pySpark ML API in version 2.0.0 for a simple linear regression example, I get an error message with the new ML library.

The code:

from pyspark.sql import SQLContext sqlContext =SQLContext(sc) from pyspark.mllib.linalg import Vectors data=sc.parallelize(([1,2],[2,4],[3,6],[4,8])) def f2Lp(inStr): return (float(inStr[0]), Vectors.dense(inStr[1])) Lp = data.map(f2Lp) testDF=sqlContext.createDataFrame(Lp,["label","features"]) (trainingData, testData) = testDF.randomSplit([0.8,0.2]) from pyspark.ml.regression import LinearRegression lr=LinearRegression() model=lr.fit(trainingData) 

and error:

 IllegalArgumentException: u'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce. ' 

How do I convert vector functions from .mllib to .ml?

+7
machine-learning pyspark
source share
1 answer

From Spark2.0 use

 from pyspark.ml.linalg import Vectors, VectorUDT 

instead

 from pyspark.mllib.linalg import Vectors, VectorUDT 
+8
source share

All Articles