Difference between apache spark vectors mllib.linalg and spark.util vectors for machine learning

I am trying to inject neural networks into spark and scala, but I am not able to perform any multiplication of a vector or matrix. Sparks provide two vectors. Spark.util night anchor operation, but it is deprecated. mllib.linalg vectors do not support operations in scala.

Which one should be used to store weights and training data?

How to do vector multiplication in scala sparks with mllib like w * x, where w is the vector or weight matrix, and x is the input. pyspark vector dot, but in scala I cannot find such a function in vectors

+7
scala machine-learning apache-spark apache-spark-mllib
source share
1 answer

Well, if you need full support for linear algebra operators, you must implement them yourself or use an external library. In the second case, the obvious choice is Breeze .

It is already being used behind the scenes, so it doesn't introduce additional dependencies, and you can easily modify existing Spark code for conversions:

import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} def toBreeze(v: Vector): BV[Double] = v match { case DenseVector(values) => new BDV[Double](values) case SparseVector(size, indices, values) => { new BSV[Double](indices, values, size) } } def toSpark(v: BV[Double]) = v match { case v: BDV[Double] => new DenseVector(v.toArray) case v: BSV[Double] => new SparseVector(v.length, v.index, v.data) } 

Mahout provides interesting Spark and Scala bindings , you can also find interesting ones.

For simple matrix vector multiplications, it is easier to use existing matrix methods. For example, IndexedRowMatrix and RowMatrix provide multiply methods that can take a local matrix. You can check Matrix Multiplication in Apache Spark for an example usage.

+6
source share

All Articles