I have a simple, high-rated service for an online game, and it has become more popular than expected. A high score is a web service that uses the MYSQL backend with a simple table, as shown below. Each highly rated entry is stored as a row in this table. The problem is that s> 140k lines, I see that some key requests are slowed down so much that in the near future they will be too slow to serve requests.
The main table looks like this:
- id is a unique key for each account entry
- the game is the identification number of the game that sent the score (currently it is always "1", it will soon have to support more games).
- name is the display name for representing this player.
- playerId is a unique identifier for this user.
- rating is a numerical rating represented by ex 42,035.
- time - time of sending
- rank is a large integer that uniquely sorts the score for a given game. itβs common for people to strike a specific account, so in this case the connection is broken for those who applied first. Therefore, this field value is approximately equal to "score" 100000000 + (MAX_TIME - time) "
+ ---------- + --------------- + ------ + ----- + --------- + ---------------- +
| Field | Type | Null | Key | Default | Extra |
+ ---------- + --------------- + ------ + ----- + --------- + ---------------- +
| id | int (11) | NO | PRI | NULL | auto_increment |
| game | int (11) | YES | MUL | NULL | |
| name | varchar (100) | YES | | NULL | |
| playerId | varchar (50) | YES | | NULL | |
| score | int (11) | YES | | NULL | |
| time | datetime | YES | | NULL | |
| rank | decimal (50.0) | YES | MUL | NULL | |
+ ---------- + --------------- + ------ + ----- + --------- + ---------------- +
Indexes are as follows:
+ ----------- + ------------ + ---------- + ------------- - + ------------- + ----------- + ------------- + -------- - + -------- + ------ + ------------ + --------- +
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+ ----------- + ------------ + ---------- + ------------- - + ------------- + ----------- + ------------- + -------- - + -------- + ------ + ------------ + --------- +
| pozscores | 0 | PRIMARY | 1 | id | A | 138296 | NULL | NULL | | BTREE | |
| pozscores | 0 | game | 1 | game | A | NULL | NULL | NULL | YES | BTREE | |
| pozscores | 0 | game | 2 | rank | A | NULL | NULL | NULL | YES | BTREE | |
| pozscores | 1 | rank | 1 | rank | A | 138296 | NULL | NULL | YES | BTREE | |
+ ----------- + ------------ + ---------- + ------------- - + ------------- + ----------- + ------------- + -------- - + -------- + ------ + ------------ + --------- +
When a user requests high scores, they usually request about 75 of the best points from an arbitrary point in "sorted in descending order". These queries are usually for "alltime" or just for points over the past 7 days.
A typical query looks like this: "SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 0, 75;" and runs 0.00 seconds.
However, if you request to the end of the list, "SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 10000, 75;" and works after 0.06 s.
"SELECT * FROM scoretable WHERE game=1 AND time>? ORDER BY rank DESC LIMIT 100000, 75;" and works after 0.58 seconds.
It seems that this will quickly start too long, as several thousand new points are awarded every day!
In addition, there are two other types of queries used to search for a specific player by id in an ordered list of ranks. They look like this:
"SELECT * FROM scoretable WHERE game=1 AND time>? AND playerId=? ORDER BY rank DESC LIMIT 1"
and then
"SELECT count(id) as count FROM scoretable WHERE game=1 AND time>? AND rank>[rank returned from above]"
My question is: what can be done to make this a scalable system? I see that the number of rows that grow will be several million in the near future. I was hoping that choosing some smart indexes would help, but the improvement was only marginal.
Update: Here is an explanation line:
mysql> explain SELECT * FROM scoretable WHERE game = 1 AND time> 0 ORDER BY rank DESC LIMIT 100000, 75;
+ ---- + ------------- + ----------- + ------- + ---------- ----- + ------ + --------- + ------ + -------- + ----------- - +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ----------- + ------- + ---------- ----- + ------ + --------- + ------ + -------- + ----------- - +
| 1 | SIMPLE | scoretable | range | game | game | 5 | NULL | 138478 | Using where |
+ ---- + ------------- + ----------- + ------- + ---------- ----- + ------ + --------- + ------ + -------- + ----------- - +
Solution found!
I solved the problem thanks to some pointers from this thread. Running a clustered index was exactly what I needed, so I converted the table to use InnoDB in mysql, which supports clustered indexes. Then I deleted the id field and simply set the primary key (ASC game, DESC rank). Now all queries are executed very quickly, no matter what offset I use. The explanation shows that no additional sorting is performed, and it seems that it easily handles all traffic.