Leader Implementation

Users on my site create annotations for rap lyrics ( example ). I want to create a leaderboard to reward people who create the most annotations.

The leaderboard should track the number of annotations that each user created as a whole, as well as how many he created in the last week, day, etc.

I have no problem implementing a common leaderboard:

@users = User.all <table> <tr> <th>Contributor</th> <th>Annotations</th> </tr> <% @users.sort_by{|u| u.annotations.size }.reverse.each do |u| %> <tr> <td><%= u %></td> <td><%= u.annotations.size %></td> </tr> <% end %> </table> 

But when I try to implement (say) a daily scoreboard, I repeat the code, and the operation is very slow (because it should iterate over each annotation in memory, and not rely on sorting / counting the database):

 <table> <tr> <th>Contributor</th> <th>Annotations</th> </tr> <% @users.sort_by{|u| u.annotations.select{|a| a.created_at > 1.day.ago }.size }.reverse.each do |u| %> <tr> <td><%= u %></td> <td><%= u.annotations.select{|a| a.created_at > 1.day.ago }.size %></td> </tr> <% end %> </table> 

What is the best way to implement a daily / weekly scoreboard?

+4
source share
4 answers

Leaders in general are a pain to implement. Well, in my experience, the actual implementation is pretty straightforward, so it's hard to scale them. Often you have to run a lot of database queries, which are quite intense in the database. To handle daily / weekly reports, most likely you are querying the datetime column, but then this means that you have an index in the specified column. This index is really only useful for leader requests and makes all the other entries in this table to pay the price, because the index must be recalculated.

Another approach is to generate statistics for the planned interval, and you write this data into a separate table, which is used by leader requests. For example, you have a background task that runs every night when you run a request (this may be an expensive option because it does not use the datetime index, but since it is run only once, and through the background task it is “normal”), this query, in turn, writes to the statistics table that it has an index in the datetime column, then you rewrite your leader’s page to achieve your previously calculated statistics. Depending on your needs, it may be that the cron script also performs other data processing and preliminary calculations, so the leaderboard page should do as little calculation as possible.

At this moment, your leader’s page is working for you, and while she is in the table with the index, she will still have to read a large number of rows. This assumes you have decent traffic. Having an indexed query that falls into a large number of rows on each page is still expensive. So, now you are thinking about implementing page caching, possibly storing data in memcached. That is, since the daily leaderboard data changes at least every day, by definition, their expensive repeated execution of these database queries on each page view. It makes sense to cache this daily data in memcached, and every page view only removes memcached.

So, as you can see its developed process. If your traffic is low, you can leave with no separate table and just have an index in the datetime column. Running sums, counts, and averages can be fine. But it does not scale. Therefore, you need to think about breaking it down into a more optimized structure. And then you see that the same request is repeated every day, while the underlying data does not change in 24 hours, it is expensive, so you go on to configure caching. There are many moving parts, and this can get complicated, well, really just tiring fast.

I engage in battle when it comes to leaders, and while they are great for game mechanics and motivate people (everyone loves to see appreciation!), Her pain in the ass to do the job on a large scale.

+10
source

Have you considered storing these statistics in a separate table / model that is updated by the observer? You do a lot of hard lifting in terms of here, which is usually not a good practice.

+3
source

I would suggest using Redis. You may have a task of type cron that pulls data from your database and then puts it in a sorted Redis set. The sorted set feature is probably the best utility for storing leaders. http://redis.io/topics/data-types

+3
source

In addition to Jeff's suggestion for using redis, it uses a ruby ​​stone that I used to work with a leader who copies from redis: https://github.com/agoragames/leaderboard

+3
source

All Articles