Given the input, the simplest thing you can do is use Vectors.parse :
scala> import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.linalg.Vectors scala> Vectors.parse("[-0.50,-2.36,-3.40]") res14: org.apache.spark.mllib.linalg.Vector = [-0.5,-2.36,-3.4]
It also works with a sparse view:
scala> Vectors.parse("(10,[1,5],[0.5,-1.0])") res15: org.apache.spark.mllib.linalg.Vector = (10,[1,5],[0.5,-1.0])
Combining it with your data, you need:
rdd.map(Vectors.parse)
If you expect incorrect / empty lines, you can wrap them using Try :
import scala.util.Try rdd.map(line => Try(Vectors.parse(line))).filter(_.isSuccess).map(_.get)
source share