I have an online game in which I record a lot of gameplay features. These statistics tables are getting very fast, and I have to be careful because simply writing more statistics can make the game performance very poor once the table is big enough.
My strategy, which is not very good, is to keep statistics tables small. I have an automatic process that creates a new table every 24 hours, which prevents performance from being too out of hand. But my solution is ugly and represents a kind of “rotation” of statistics tables. I use innodb and set up several indexes to improve performance, and then I just maintain 30 of these tables (each one is 24 hours, so I keep a month of statistics). Every 24 hours, my automatic process deletes the "stats30" table, and then renames all the numbered tables to have a larger number, and then creates a new empty table, called simply "statistics". This is a "live" table in which statistics are actively recorded.
These tables basically record each transaction between each player and each other player in the game with which they interact, thus an exponential explosion of data. When a new transaction occurs, he checks to see if there is already a row for transactions between the two players during this day. If there is, it updates the line with the changes in its transactions. Otherwise, a new line is created. A pair of players who interact 1000 times a day, and a pair who interact just once will have only one row in the table for that day. Each action in the database includes SELECT, and then either UPDATE or INSERT, so it is quite even between reading and writing, as it is currently designed. Reading data in a larger sense, i.e. For the analysis of statistics and several players, it is very rarely performed, compared to single SELECT, UPDATE and INSERT. About 150,000 lines are created per day.
I know it could be better. I cannot easily reduce the amount of data that I write, but 1.performance and 2.simplicity bother me. I could increase my productivity even more by creating a new table created every 4 hours, for example, but then I have to intervene in 180 tables. Conversely, I could make it easier by just using one table, and then it all comes to a squeal.
Note that I need to update the rows in these tables, so I can’t use something like the ARCHIVE storage engine, but I just need to insert INSERT or UPDATE in the “live” stats table.
There is also a minor problem associated with the fact that when the daily rotation process takes place, any requests coming in at this point may be lost. (If he is in the process of renaming all tables and creating a new one, new records may fail.) Losing several inserts is not a big problem, and a solution in which this error will not occur or can be performed "atomically" would be better.
Thanks for any ideas that might help! :)