Spark MLLib Collaborative Filtering with a new user

I am testing a collaborative filtering algorithm that is implemented in Spark, and I am running the following problem:

Suppose I train a model with the following data:

u1|p1|3 u1|p2|3 u2|p1|2 u2|p2|3 

Now, if I test it with the following data:

 u1|p1|1 u3|p1|2 u3|p2|3 

I never see ratings for user "u3", presumably because this user does not appear in the training data. Is it due to a cold start problem? I got the impression that this question will only apply to a new product. In this case, I expected the prediction for “u3”, since “u1” and “u2” in the training data have similar rating information with “u3”. Is this the difference between model-based and memory-based collaborative filtering?

+5
source share
1 answer

I assume you are talking about the ALS algorithm?

'u3' is not a pair of your training sets, and therefore your model knows nothing about this user. All that could possibly return the average rating to all users.

A look into the Spark 1.3.0 code Scala: MatrixFactorizationModel returned by ALS.train() tries to find the user and product in the function vectors when you call predict() . I get a NoSuchElementException when I try to predict an unknown user rating. It is just implemented that way.

+1
source

Source: https://habr.com/ru/post/1215723/


All Articles