I am trying to use the functions columnSimilarities (), computeColumnSummaryStatistics ()
- Especially the columnSimilarities () function mentioned in this post:
https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html
I use a list of sparse vectors from mlib.
sparse_vectors = []
for cust, group in df.groupby(0):
i_v = zip(group[1].values, group[2].values)
i_v = sorted(i_v)
indices = [x[0] for x in i_v]
values = [x[1] for x in i_v]
sparse_vectors.append(Vectors.sparse(len(df[1].unique()), indices, values))
rows = sc.parallelize(sparse_vectors)
mat = RowMatrix(rows)
I get an error message:
AttributeError: 'RowMatrix' object does not have 'ComputeColumnSummaryStatistics' attribute
or
AttributeError: 'RowMatrix' object does not have the 'ColumnSimilarities' attribute
every time I run functions.
Is this a PySpark issue unlike Scala Spark? I also cannot find the RowMatrix features page using a Google search.
thank