Feedback on the parameters of the ranking algorithm for my site

Question

Feedback on the parameters of the ranking algorithm for my site

I am currently working on writing an algorithm for my new site, which I plan to launch in the near future. The index page will display the “hottest” posts so far. Variables to consider:

Number of votes
How controversial is the message (# between 0-1)
Post Post Time

I came up with two possible algorithms: the first and the simplest:

controversial * (numVotesThisHour / (numVotesTotal - numVotesThisHour) Denom = numVotesTuisHour if numVotesTotal - numVotesThisHour == 0

The highest number is the hottest

My other option is to use an algorithm like Reddit (except that the score decreases over time):

 [controversial * log(x)] - (TimePassed / interval) x = { numVotesTotal if numVotesTotal >= 10, 10 if numVotesTotal < 10

The highest number is the hottest

The first algorithm would allow older messages to become hot again in the future, and the second not.

So my question is: which of these two algorithms do you consider more efficient? Which of them, in your opinion, will show really "hot" topics at the moment? Can you come up with any advantages or disadvantages in using one of them? I just want to make sure that I remember nothing so that I can ensure that the content is as relevant as possible. Any feedback would be great! Thanks!

+6

math algorithm ranking

Hockeyref45 Nov 13 '12 at 17:22

source share

3 answers

arunlalam · Answer 1 · 2012-11-13T17:49:17+0000

I'm missing something. In the first formula, you have numVotesTotal in the denominator. Therefore, more votes all the time will mean that he will never be so hot, even if he is not so old.

For example, if I have two messages - P1 and P2 (both are equally contradictory). Let's say P1 has numVotesTotal = 20, and P2 has numVotesTotal = 1000. Now, in the last hour, P1 gets numVotesThisHour = 10, and P2 gets numVotesThisHour = 200.

According to the algorithm, P1 is better known than P2. It makes no sense to me.

Thomas weldon · Answer 2 · 2012-11-13T19:25:58+0000

I think the first algorithm is too much dependent on the instant trend. Think of NASCAR, the current leader can go 0 mph because he is at the pit stop. The second uses the concept of an average trend. I think both have their uses.

Thus, for two posts with the same total number of votes and a controversial rating, but where messages receive 20 votes in the first hour and zero in the second, and the other receives 10 in each hour. The first post will be buried by the first algorithm, but the second algorithm will rank them equally.

whybird · Answer 3 · 2015-10-09T07:11:06+0000

YMMV, but I think that the "vehemence" depends entirely on the time interval, and not on the total number of votes, if your time interval is "all the time". In addition, it seems to me that the share of all votes in the corresponding time frame, and not their absolute number, is an important figure.

You can have several categories of hot:

The hottest hour
The hottest this week
Hottest since your last visit.
Hottest all time

So, "The hottest in the last [independently]" can be calculated as follows:

 votes_for_topic_in_timeframe / all_votes_in_timeframe

if you especially need a number from 0 to 1 (useful for comparison between categories), or if you want only those that were in a specific timeframe, just enter the votes_for_topic_in_timeframe values and sort them in descending order.

If you do not want the user to explicitly select a time interval, you may need to calculate all (say) the four versions (or perhaps only the top 3), assign a multiplier for each category to give each category a relative importance, and calculate the common values for each topics to take the top n. This has the advantage that it can potentially hide from a user whom no one has voted in the last hour;)

Feedback on the parameters of the ranking algorithm for my site

More articles: