Well, if you need full support for linear algebra operators, you must implement them yourself or use an external library. In the second case, the obvious choice is Breeze .
It is already being used behind the scenes, so it doesn't introduce additional dependencies, and you can easily modify existing Spark code for conversions:
import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} def toBreeze(v: Vector): BV[Double] = v match { case DenseVector(values) => new BDV[Double](values) case SparseVector(size, indices, values) => { new BSV[Double](indices, values, size) } } def toSpark(v: BV[Double]) = v match { case v: BDV[Double] => new DenseVector(v.toArray) case v: BSV[Double] => new SparseVector(v.length, v.index, v.data) }
Mahout provides interesting Spark and Scala bindings , you can also find interesting ones.
For simple matrix vector multiplications, it is easier to use existing matrix methods. For example, IndexedRowMatrix and RowMatrix provide multiply methods that can take a local matrix. You can check Matrix Multiplication in Apache Spark for an example usage.
zero323
source share