How can I improve this PHP / MySQL news feed?

Let me start right off the bat, saying that I know this is not the best solution. I know this is kludgy and hack feature. But that’s why I am here!

This question / work is building some discussion of Quora with Andrew Bosworth , creator of the Facebook news feed.

I am creating a news feed . It is built exclusively in PHP and MySQL .

alt text




MySQL

The relational model for the feed consists of two tables. One table functions as an activity log; In fact, it was called activity_log . Another table is newsfeed . These tables are almost identical.

The schema for the log is activity_log(uid INT(11), activity ENUM, activity_id INT(11), title TEXT, date TIMESTAMP)

... and the scheme for the feed newsfeed(uid INT(11), poster_uid INT(11), activity ENUM, activity_id INT(11), title TEXT, date TIMESTAMP) .

At any time, when a user does something related to a news channel, for example, by asking a question, he will be immediately registered in the activity log .




Creating News Feeds

Then every X minutes (after 5 minutes it will be changed to 15-30 minutes), I started the cron task , which runs the script below. This script goes through all the users in the database, finds all the actions for all users of this user, and then writes these Actions in the news feed.

Currently, the SQL that selects the activity (called ActivityLog::getUsersActivity() ) has the LIMIT 100 argument set for performance *. * Not that I know what I'm talking about.

 <?php $user = new User(); $activityLog = new ActivityLog(); $friend = new Friend(); $newsFeed = new NewsFeed(); // Get all the users $usersArray = $user->getAllUsers(); foreach($usersArray as $userArray) { $uid = $userArray['uid']; // Get the user friends $friendsJSON = $friend->getFriends($uid); $friendsArray = json_decode($friendsJSON, true); // Get the activity of each friend foreach($friendsArray as $friendArray) { $array = $activityLog->getUsersActivity($friendArray['fid2']); // Only write if the user has activity if(!empty($array)) { // Add each piece of activity to the news feed foreach($array as $news) { $newsFeed->addNews($uid, $friendArray['fid2'], $news['activity'], $news['activity_id'], $news['title'], $news['time']); } } } } 



News Display

In the client code, when you select the user feed, I do something like:

 $feedArray = $newsFeed->getUsersFeedWithLimitAndOffset($uid, 25, 0); foreach($feedArray as $feedItem) { // Use a switch to determine the activity type here, and display based on type // eg User Name asked A Question // where "A Question" == $feedItem['title']; } 



News improvement

Now forgive my limited understanding of the best news channel development practices, but I understand the approach that I use to be a limited version of what is called fan-out when recording , limited in the sense that I am doing cron work as an intermediate step instead of to write directly to user news feeds. But this is very different from the pull model, in the sense that the user's news feed is not compiled at boot time, but rather on a regular basis.

This is a big question, which probably deserves a lot of back and forth, but I think it can serve as a touchstone for many important conversations that new developers like me should have. I'm just trying to figure out what I'm doing wrong, how I can improve, or how I should even start from scratch and try a different approach.

Another thing that deceives me in this model is that it works on the basis of relevance, not relevance. If anyone can suggest how this can be improved to work with relevance, I would be all ears. I use the Directed Edge API to generate recommendations, but it seems like something like a news channel, the recommendations will not work (since there was nothing before!).

+60
php mysql web-applications feeds
Nov 12 '10 at 5:55
source share
5 answers

Really cool question. I am actually at the very beginning of the implementation of something like this. So, I'll think a little.

Here are the flaws that I see in my mind with your current implementation:

  • You process all friends for all users, but ultimately you will process the same users many times because the same groups of people have similar friends.

  • If one of my friends writes something, he will not appear in my news feed for more than 5 minutes. If he should appear right away, right?

  • We read the entire news feed for the user. Don't we just need to capture new actions since the last time we cracked magazines?

  • It does not scale very well.

A feed looks exactly like an activity log; I would stick to this table of activity logs.

If you look at activity logs in databases, it will make it easy for you to scale. You can fine your users if you want, but even if you have 10 million user records in one table, mysql should read fine. Therefore, whenever you look at a user, you know which shard should access the user's logs. If you archive your old magazines every so often and keep only a fresh set of magazines, you won’t have to cheat as much. Or maybe even at all. You can manage many millions of records in MySQL if you are even moderately well tuned.

I would use memcached for your user table, and possibly even for the logs themselves. Memcached allows you to store entries in a cache up to 1 MB in size, and if you can organize your keys, you can get all the latest logs from the cache.

This will work more in terms of architecture, but it will allow you to work in real time and scale in the future ... especially if you want users to start commenting on each entry.;)

Have you seen this article?

http://bret.appspot.com/entry/how-friendfeed-uses-mysql

+11
Jun 30 '11 at 7:44
source share

Would you add a statistical keyboard? I made a (crude) implementation by exploding the body of my document, removing HTML, deleting common words, and counting the most common words. I did this a few years ago just for fun (as in any such project, the source was gone), but it worked for my temporary test blog / forum setup. Perhaps it will work for your news channel ...

0
Nov 12 '10 at 6:12
source share

between you can use user flags and caching. Suppose you have a new field for the user as last_activity. Refresh this field whenever a user enters any action. Keep the flag, until what time you upload feeds, say feed_updated_on.

Now update the function $ user-> getAllUsers (); to return only users who have a last_activity time later than feed_updated_on. This excludes all users who do not have an activity log :). A similar process for friends of users.

You can also use caching, such as caching file or file caching.

Or use some nosql database to store all feeds as a single document.

0
Jun 26 '11 at 12:24
source share

I am trying to create a Facebook style news feed myself. Instead of creating another table for registering user actions, I calculated the "edge" from UNION messages, comments, etc.

With a bit of math, I calculate the “edge” using the exponential decay model, and the elapsed time is an independent variable, taking into account the number of comments, likes, etc. each post should formulate a constant lambda. At first, the edge will decrease rapidly, but gradually smoothes to almost zero after a few days (but never reaches 0)

When a feed is shown, each edge is multiplied using RAND (). Messages with a higher edge will appear more often

Thus, more popular posts are more likely to appear in the news feed for a longer time.

0
Jun 30 2018-11-11T00:
source share

Instead of running the cron job, some kind of post-fixing script. I don’t know specifically what features PHP and MySQL have in this regard - if I remember correctly MySQL InnoDB allows you to use more complex functions than other varieties, but I don’t remember if there are such things as triggers in the latest version.

anyway, a simple manifold that doesn't rely on a lot of database magic:

when user X adds content:

1) make an asynchronous call from your PHP page after committing the database (asynchronously, of course, so that the user viewing the page does not wait for it!)

The call invokes an instance of your logical script.

2) the script logic goes only through the friends list [A, B, C] of the user who committed the new content (as opposed to the list of all in the database!) And adds the action of user X for each of these users.

You can simply save these channels as direct JSON files and add new data to the end of each. It’s better, of course, to save the feeds in a cache with a backup to the file system or BerkeleyDB or Mongo or whatever.

This is just a basic idea for repetition based feeds rather than relevancy feeds. You CAN store the data sequentially in this way, and then do additional parsing for each user to filter by relevance, but this is a difficult problem in any application and probably not one that can easily be resolved by an anonymous web user without detailed knowledge of your requirements; )

Jsh

0
Jun 30 2018-11-11T00:
source share



All Articles