Implement a recommendation system for unsupervised learning

I reviewed documents and books on recommendation systems and approaches proposed for their creation. In many of them, the Netflix contest was presented as an example. On Netflix, users rate movies (1 to 5). In this competition, the participants were provided with a database of films and corresponding user ratings, and they had to implement a system that best predicts the rating of films, and using this rating will offer films to users.

For the assessment, they offer cross-validation using measures that use predicted and real estimates as arguments. The projected rating is calculated using the user's history and his ratings for films.

I am trying to create a recommendation system for news. The problem that I am facing now is that the news is relevant only for a short time, and almost no one will give a rating to the news. So, I only have implicit feedback (opinions) and no explicit feedback (rating). Also in the Netflix issue they are provided with a database. I’m wondering how to deal with the “cold start” problem, because at the beginning no news will be read (viewed).

I will be so grateful if you could offer me how to avoid the cold start problem, and as soon as I have an algorithm, how can I check if it works fine.

Thanks!

+5
source share
2 answers

Films are a great example for classic collaborative filtering: they have long been interested in people, there are relatively few of them, many people have overlapping interests, and star ratings make sense. News stories are completely different. Instead of co-filtering, look at content-based filtering. That people's interests coincide with the identifiers of the content (which may be keywords in the news or the publisher, or metadata about the time of day or region of the world). Viewing metrics are your best bet for information on people's preferences, and also allow you to use some data mining techniques, such as smart association management.

While you will still have a problem starting the user coldly - where the new user on your system has not provided you any information about their preferences, unless you download it from developing your tweets or Facebook interests or something similar - you can Avoid problems with a cold start. Instead of relying on the news, read through your community as the only way to get similar objects, you can use a different case. In particular, try Wikipedia and see WikiBrain ( https://github.com/shilad/wikibrain ). This is an API with which you can get the similarity of one concept from another and apply it to your recommendations.

+2
source

To get started with this project that you are undertaking, I would suggest clustering to find a news template that is relevant / popular. More features that you include in such a way as to add value to your results (this part requires careful selection, study and statistical analysis).

To recommend news - you can have a multi-level approach, so let the first part check for articles that are “positive” / contain specific keywords from the people who commented on this article.

Then, perhaps, the second multilevel approach will be to cross-reference the twitter response to this article, to facebook like / traffic, how much the user pinterest points to this article, etc.

You can also check trend keywords from google, bing, etc. on specific topics to make sure the article you are showing is relevant

I also suggest starting a small reason, there are so many articles on the Internet - perhaps consider concentrating on one topic and then generalizing it. As you can see, the popularity of “articles” is related to certain voices that people follow in order to find another way to find the relevance of this article.

Here's more information about fake learning: http://en.wikipedia.org/wiki/Unsupervised_learning

You might also want to study maximizing expectations to find which variables will improve the unobservable data you received. Here's a full explanation of EM https://stats.stackexchange.com/questions/72774/numerical-example-to-understand-expectation-maximization

0
source

Source: https://habr.com/ru/post/1211732/


All Articles