Best practices with historical data in MySQL database

Recently, I have been thinking about the best methods for storing historical data in a MySQL database. At the moment, each version of the table has two columns - valid_from and valid_to , both of DATETIME types. Records with current data have valid_from filled with its creation day. When I update this line, I fill in valid_to date of update and add a new record with valid_from in the same way as valid_to in the previous line is simple stuff. But I know that the table will be huge very fast, so getting data can be very slow.
I would like to know if you have any practices for storing historical data?

+7
source share
3 answers

This is a common mistake to worry about "large" tables and performance. If you can use indexes to access your data, it doesn't matter if you have 1,000 out of 1,000,000 records - at least not in the way you could measure. The design you use is commonly used; it's a great design where time is a key part of business logic.

For example, if you want to find out what the price of an item was when the customer placed the order, the ability to search for product records where valid_from <order_date and valid_until is either null or> order_date is by far the easiest solution.

This is not always the case - if you save data only for archiving purposes, it may make sense to create archive tables. However, you must be sure that time is really not part of the business logic, otherwise the pain of finding multiple tables will be significant - imagine that you need to look for a product table or a product_archive table every time you want to find out about the price of a product at the time of placing the order.

+7
source

This is an incomplete answer, just a few suggestions.

You can add an indexed boolean field, for example is_valid . This should improve performance with a large table with historical and current records.

In general, storing historical data in a seprate table can complicate your application (just imagine the complexity of a query that should receive data with mixed current and historical records ...).

Today computers are very fast. I think you should compare / check performance with a separate table and a separate table for historical records.

Also, try checking your hardware to see how quickly MySQL with large tables determines how to create the database. If this is too slow for you - you can configure the MySQL configuration (start by increasing the cache / RAM).

0
source

I am nearing the end of an application that does just that. Most indexes are indexed by key fields first, and then the valid_to field, which is set to NULL for current records, which makes it easy to instantly find current records. Since most of my application works with real-time operations, indexes provide fast performance. From time to time, someone needs to see historical records, and in this case there are performance hits, but testing is not so bad, since most records do not have a lot of changes in their entire lives.

In cases where you can have a lot more expired records of different keys than the current records, he can pay for the index over valid_to in front of any key fields.

0
source

All Articles