MySQL idea for big "rotation" of statistics?

Question

MySQL idea for big "rotation" of statistics?

I have an online game in which I record a lot of gameplay features. These statistics tables are getting very fast, and I have to be careful because simply writing more statistics can make the game performance very poor once the table is big enough.

My strategy, which is not very good, is to keep statistics tables small. I have an automatic process that creates a new table every 24 hours, which prevents performance from being too out of hand. But my solution is ugly and represents a kind of “rotation” of statistics tables. I use innodb and set up several indexes to improve performance, and then I just maintain 30 of these tables (each one is 24 hours, so I keep a month of statistics). Every 24 hours, my automatic process deletes the "stats30" table, and then renames all the numbered tables to have a larger number, and then creates a new empty table, called simply "statistics". This is a "live" table in which statistics are actively recorded.

These tables basically record each transaction between each player and each other player in the game with which they interact, thus an exponential explosion of data. When a new transaction occurs, he checks to see if there is already a row for transactions between the two players during this day. If there is, it updates the line with the changes in its transactions. Otherwise, a new line is created. A pair of players who interact 1000 times a day, and a pair who interact just once will have only one row in the table for that day. Each action in the database includes SELECT, and then either UPDATE or INSERT, so it is quite even between reading and writing, as it is currently designed. Reading data in a larger sense, i.e. For the analysis of statistics and several players, it is very rarely performed, compared to single SELECT, UPDATE and INSERT. About 150,000 lines are created per day.

I know it could be better. I cannot easily reduce the amount of data that I write, but 1.performance and 2.simplicity bother me. I could increase my productivity even more by creating a new table created every 4 hours, for example, but then I have to intervene in 180 tables. Conversely, I could make it easier by just using one table, and then it all comes to a squeal.

Note that I need to update the rows in these tables, so I can’t use something like the ARCHIVE storage engine, but I just need to insert INSERT or UPDATE in the “live” stats table.

There is also a minor problem associated with the fact that when the daily rotation process takes place, any requests coming in at this point may be lost. (If he is in the process of renaming all tables and creating a new one, new records may fail.) Losing several inserts is not a big problem, and a solution in which this error will not occur or can be performed "atomically" would be better.

Thanks for any ideas that might help! :)

+4

database-design

DivideByHero Apr 17 '09 at 15:18

source share

5 answers

lloesche · Answer 1 · 2009-04-17T16:27:24+0000

With 150 thousand lines a day, that’s average. row size? Do these lines have redundant data that can be minimized by simply storing links?

In general, it is always good to keep tables small so that index updates are quick. In addition, as mentioned above, you should optimize your queries so that there is no access to columns with a missing index, etc. You can find some possible problems with EXPLAIN and in your mysql server's slow query log if you have it turned on.

One thing that can help you with performance issues is the memcached daemon. Using it, you can postpone records to your database, thereby taking out part of the steam and still will not suffer from dirty caches and the like. Although, depending on the structure of the application used (if any), its implementation requires some work.

For archiving and statistics purposes, I suggest you take a look at InfoBright ( http://www.infobright.org/ ). This is a replacement MySQL open source replacement (based on MySQL). This is the intended purpose of a data warehouse repository. However, you can use it for all types of large-volume data analysis. He got a very nice compression function, which in our case reduces ~ 23 TB of raw data to about 1.2 TB of compressed data. I believe there is no need to say that requesting individual lines of compressed data can / will be quite slow. But for statistics, it's pretty darn fast. Therefore, if you do not request specific lines, but instead analyze things like "how many lines have been updated with the value foo> bar between dec 08 and feb 09", this will give you very good performance. In fact, when you use the database, it will analyze your use and create a knowledge grid that will optimize the data for your specific queries.

The next question that comes to my mind is ... if only the "only" data / session data that you hold for a day or several hours is a relational database, the right tool for the job? Not knowing the exact nature of your application, I could imagine some sessions in memory (which could exist in a terracotta cluster, for example) that write a transaction log and record their data so often that they may be better suited. But, as I said, it greatly depends on the nature of your application and the amount of data in question.

Steve ross · Answer 2 · 2009-04-17T15:28:49+0000

There is no database guru, but have you tried using a different db engine (innoDB is slow, try myISAM) and make sure that the data you select is indexed?

seems like a dumb question, but this decision is pretty crazy for you

also try the following: http://forums.mysql.com/read.php?24.92131.92131 for performance tuning.

Ben s · Answer 3 · 2009-04-17T15:31:27+0000

The first thing to do is profile your application to find what actually takes the most time. You cannot continue to use this rotation of the table if another change gives better performance.

Run EXPLAIN for all your queries and make sure you understand the results and optimize your queries accordingly .

I also highly recommend that you read the reference guide for optimizing MySQL . It gives a lot of advice on how to tune the server for optimal performance and can give you an idea of what might cause performance bottlenecks.

If the performance that the live table represents is really a problem, and you need to reduce the number of rows in that table, simply move the rows to the archive table. Thus, you always have a thin living table. and can easily run longer queries in the archive.

Eric Petroelje · Answer 4 · 2009-04-17T15:32:59+0000

How many lines are we talking here? Millions? Hundreds of thousands?

Since you said that you need to update the rows in a live table, it sounds as if you could do this in just two tables - the live stats table and the statsarchive table. Then your batch job will simply move the rows from the current statistics table to the stats_archive table. Other work may clear rows from the archive table after they reach a certain age. It would also be nice to have a third job that periodically recounts statistics in the archive table so that MySQL can create good plans for your queries.

Srikar doddi · Answer 5 · 2009-04-17T15:42:14+0000

I would like to know more about your domain to give a more accurate answer. But the short answer is, you will need some kind of partitioning based on the month, year or geographic location. Thus, all statistics of users belonging to a certain city will be included in these tables. This way your selections can be faster.

And, of course, the usual yada yada about adding indexes ...

Tell me more, I can help you ...

MySQL idea for big "rotation" of statistics?

More articles: