High score database scaling

Question

High score database scaling

I have a simple, high-rated service for an online game, and it has become more popular than expected. A high score is a web service that uses the MYSQL backend with a simple table, as shown below. Each highly rated entry is stored as a row in this table. The problem is that s> 140k lines, I see that some key requests are slowed down so much that in the near future they will be too slow to serve requests.

The main table looks like this:

id is a unique key for each account entry
the game is the identification number of the game that sent the score (currently it is always "1", it will soon have to support more games).
name is the display name for representing this player.
playerId is a unique identifier for this user.
rating is a numerical rating represented by ex 42,035.
time - time of sending
rank is a large integer that uniquely sorts the score for a given game. it’s common for people to strike a specific account, so in this case the connection is broken for those who applied first. Therefore, this field value is approximately equal to "score" 100000000 + (MAX_TIME - time) "

  + ---------- + --------------- + ------ + ----- + --------- + ---------------- +
 |  Field |  Type |  Null |  Key |  Default |  Extra |
 + ---------- + --------------- + ------ + ----- + --------- + ---------------- +
 |  id |  int (11) |  NO |  PRI |  NULL |  auto_increment |
 |  game |  int (11) |  YES |  MUL |  NULL |  |
 |  name |  varchar (100) |  YES |  |  NULL |  |
 |  playerId |  varchar (50) |  YES |  |  NULL |  |
 |  score |  int (11) |  YES |  |  NULL |  |
 |  time |  datetime |  YES |  |  NULL |  |
 |  rank |  decimal (50.0) |  YES |  MUL |  NULL |  |
 + ---------- + --------------- + ------ + ----- + --------- + ---------------- +

Indexes are as follows:

  + ----------- + ------------ + ---------- + ------------- - + ------------- + ----------- + ------------- + -------- - + -------- + ------ + ------------ + --------- +
 |  Table |  Non_unique |  Key_name |  Seq_in_index |  Column_name |  Collation |  Cardinality |  Sub_part |  Packed |  Null |  Index_type |  Comment |
 + ----------- + ------------ + ---------- + ------------- - + ------------- + ----------- + ------------- + -------- - + -------- + ------ + ------------ + --------- +
 |  pozscores |  0 |  PRIMARY |  1 |  id |  A |  138296 |  NULL |  NULL |  |  BTREE |  |
 |  pozscores |  0 |  game |  1 |  game |  A |  NULL |  NULL |  NULL |  YES |  BTREE |  |
 |  pozscores |  0 |  game |  2 |  rank |  A |  NULL |  NULL |  NULL |  YES |  BTREE |  |
 |  pozscores |  1 |  rank |  1 |  rank |  A |  138296 |  NULL |  NULL |  YES |  BTREE |  |
 + ----------- + ------------ + ---------- + ------------- - + ------------- + ----------- + ------------- + -------- - + -------- + ------ + ------------ + --------- +

When a user requests high scores, they usually request about 75 of the best points from an arbitrary point in "sorted in descending order". These queries are usually for "alltime" or just for points over the past 7 days.

A typical query looks like this: "SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 0, 75;" and runs 0.00 seconds.

However, if you request to the end of the list, "SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 10000, 75;" and works after 0.06 s.

"SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 100000, 75;" and works after 0.58 seconds.

It seems that this will quickly start too long, as several thousand new points are awarded every day!

In addition, there are two other types of queries used to search for a specific player by id in an ordered list of ranks. They look like this:

"SELECT * FROM scoretable WHERE game=1 AND time>? AND playerId=? ORDER BY rank DESC LIMIT 1"

and then

"SELECT count(id) as count FROM scoretable WHERE game=1 AND time>? AND rank>[rank returned from above]"

My question is: what can be done to make this a scalable system? I see that the number of rows that grow will be several million in the near future. I was hoping that choosing some smart indexes would help, but the improvement was only marginal.

Update: Here is an explanation line:

  mysql> explain SELECT * FROM scoretable WHERE game = 1 AND time> 0 ORDER BY rank DESC LIMIT 100000, 75;
 + ---- + ------------- + ----------- + ------- + ---------- ----- + ------ + --------- + ------ + -------- + ----------- - +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ----------- + ------- + ---------- ----- + ------ + --------- + ------ + -------- + ----------- - +
 |  1 |  SIMPLE |  scoretable |  range |  game |  game |  5 |  NULL |  138478 |  Using where |
 + ---- + ------------- + ----------- + ------- + ---------- ----- + ------ + --------- + ------ + -------- + ----------- - +

Solution found!

I solved the problem thanks to some pointers from this thread. Running a clustered index was exactly what I needed, so I converted the table to use InnoDB in mysql, which supports clustered indexes. Then I deleted the id field and simply set the primary key (ASC game, DESC rank). Now all queries are executed very quickly, no matter what offset I use. The explanation shows that no additional sorting is performed, and it seems that it easily handles all traffic.

+7

mysql

Jake pozansanski Feb 01 '11 at 3:51

source share

1 answer

anon · Accepted Answer · 2011-02-01T06:01:59+0000

Seeing as there are no interlocutors, I will do it. I am from a SQL Server database, but the same ideas apply.

Some general observations:

The ID column is rather meaningless and should not participate in any indexes if there are no other tables / queries that you are not telling us about. In fact, this is not even necessary in your last request. You can do COUNT (*).
Your clustered index should point to the most common queries. Therefore, a clustered index in ASC, DESC time, and DESC rank works well. DESC time sorting is usually a good idea for historical tables like this, when you are usually interested in the most recent things. You can also try a separate index with a rank sorted in the other direction, although I'm not sure how beneficial this is.
Are you sure you need SELECT *? If you can select fewer columns, you can create an index that contains all the columns necessary for your SELECT and WHERE.

1 million lines is really not so much. I created a table similar to yours, with 1,000,000 rows of sample data, and even with one index (ASC game, DESC time and DESC rank), all queries were completed in less than 1 second.

(The only part I'm not sure about is playerId. The queries performed so well that playerId didn't seem necessary. Perhaps you can add it at the end of your clustered index.)

High score database scaling

More articles: