Spark MLLib Collaborative Filtering with a new user

Question

Spark MLLib Collaborative Filtering with a new user

I am testing a collaborative filtering algorithm that is implemented in Spark, and I am running the following problem:

Suppose I train a model with the following data:

u1|p1|3 u1|p2|3 u2|p1|2 u2|p2|3

Now, if I test it with the following data:

 u1|p1|1 u3|p1|2 u3|p2|3

I never see ratings for user "u3", presumably because this user does not appear in the training data. Is it due to a cold start problem? I got the impression that this question will only apply to a new product. In this case, I expected the prediction for “u3”, since “u1” and “u2” in the training data have similar rating information with “u3”. Is this the difference between model-based and memory-based collaborative filtering?

+5

collaborative-filtering apache-spark apache-spark-mllib

Navin viswanath Mar 20 '15 at 5:34

source share

1 answer

stholzm · Accepted Answer · 2015-04-11T15:38:32+0000

I assume you are talking about the ALS algorithm?

'u3' is not a pair of your training sets, and therefore your model knows nothing about this user. All that could possibly return the average rating to all users.

A look into the Spark 1.3.0 code Scala: MatrixFactorizationModel returned by ALS.train() tries to find the user and product in the function vectors when you call predict() . I get a NoSuchElementException when I try to predict an unknown user rating. It is just implemented that way.

Spark MLLib Collaborative Filtering with a new user

More articles: