Popularity algorithm

I would like to fill out the home page of my site, submitted by users, with the "hottest" illustrations uploaded.

Here are the measures available to me:

  • How many people have used this illustration.
    • votes table includes the date of the voter
  • When the download was downloaded
    • illustration table has a creation date
  • Number of comments (not as good as max comments about 10 at the moment)
    • comments table has a comment date

I searched around but don't want the user's credentials to play a role, but most algorithms include this.

I also need to find out whether it is better to do calculations in MySQL, which retrieves the data, or if there should be a PHP / cron method every hour or so.

I need only 20 illustrations to fill out the homepage. For this data, I do not need any paging.

How do I weigh age against votes? Of course, a site with less submission needs less weight by the added date?

+4
source share
4 answers

Many sites that use a certain popularity rating do this using a standard algorithm to determine the rating, and then fade out forever over time. What I found works better for sites with less traffic - this is a multiplier that gives a bonus to new content / activity - it is essentially the same, but the rating stops changing after a certain period of time.

For example, here is a pseudo example of what you can try. Of course, you will need to adjust how much weight you assign to each category based on your own experience with your site. Comments are rare, but take more effort from the user than favorites / voice, so they probably should gain more weight.

 score = (votes / 10) + comments age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created) if(age < 86400) score = score * 1.5 

This type of approach will give a bonus to new content uploaded on the last day. If you would like to approach this in a similar way only for content that has been recently selected or commented on, you can simply add some WHERE constraints to your query that grab the rating from the database.

In fact, there are two big reasons for not counting this rating on the fly.

  • The requirement that your database retrieves all this data and performs calculations on each page load only to reorder the items leads to an expensive query.
  • Probably smaller production, but if you have a relatively small amount of activity on the site, small changes in the ranking can lead to a significant movement of content.

This gives you either a caching of results periodically, or setting up a cron job to update a new database column that has this rating.

+4
source

Obviously, there is some kind of subjectivity in this: there is not a single β€œright” algorithm for determining the right balance, but I would start with something like votes per unit of age. MySQL can do basic math, so you can ask it to sort by vote rate over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like

 SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20 

but my SQL is rusty; -)

Taking a simple average will, of course, be a prejudice in favor of the appearance of new images on the first page. If you want to remove this bias, you can, say, count only those votes that have occurred within a certain period after the publication of the image. For images that are later than this time limit, you will have to normalize by multiplying the number of votes by time and then dividing by the age of the image. Or, alternatively, you can give the votes an ever-changing weight, something like exp(-time(vote) + time(image)) . And so on and so forth .... depending on how special you are in what this algorithm will do, it may take some experimentation to figure out which formula gives the best results.

+2
source

I have no useful ideas regarding actual agorism, but from the point of view of implementation, I propose to cache the result somewhere with a periodic update - if the calculation results in an expensive request, you probably do not want to slow down the response time.

0
source

Sort of:

(count favorited + k) * / time since last activity

The higher k , the less weight the number of people he uses.

You can also change the time to something like the time when it first appeared + the time of the last action, this will ensure that old illustrations disappear over time.

0
source

All Articles