To get started with this project that you are undertaking, I would suggest clustering to find a news template that is relevant / popular. More features that you include in such a way as to add value to your results (this part requires careful selection, study and statistical analysis).
To recommend news - you can have a multi-level approach, so let the first part check for articles that are “positive” / contain specific keywords from the people who commented on this article.
Then, perhaps, the second multilevel approach will be to cross-reference the twitter response to this article, to facebook like / traffic, how much the user pinterest points to this article, etc.
You can also check trend keywords from google, bing, etc. on specific topics to make sure the article you are showing is relevant
I also suggest starting a small reason, there are so many articles on the Internet - perhaps consider concentrating on one topic and then generalizing it. As you can see, the popularity of “articles” is related to certain voices that people follow in order to find another way to find the relevance of this article.
Here's more information about fake learning: http://en.wikipedia.org/wiki/Unsupervised_learning
You might also want to study maximizing expectations to find which variables will improve the unobservable data you received. Here's a full explanation of EM https://stats.stackexchange.com/questions/72774/numerical-example-to-understand-expectation-maximization
source share