How to run a big update in a live database?

The web project that I am coding uses a sophisticated CSV to MySQL converter to create its own database. This means that to update the db contents with the latest changes from CSV, a converter is executed that cuts off the corresponding tables (but leaves others that are filled through the website) and fills them again with data from CSV.

Yes, this is not an excellent process, but there are good reasons why this approach was chosen according to the standard "work on actual db".

What I'm struggling with is the best way to start this update process without causing too much harm to the user. A few numbers to keep in mind:

1) This process should be performed regularly, in the range every few weeks / once a month
2) The db converter currently takes about an hour and will probably take up to 15 hours in the future, at least if the forecasts regarding the growth of the database are correct (yes, oh!)
3) The sql dump of the full database is currently less than 20 MB (which makes it easy to import via phpmyadmin), but it will soon break this barrier. I think this should not be a problem, as I can just use SSH boot.

Here are some of the alternatives that I was thinking about, they all use a separate database with global settings (these parameters are checked for every read / write on the site). Alternative 2 seems to be worse as it prevents read access during the whole conversion time, which can be quite long, as I said. They all block access to the record for about the same long time as good, but that doesn't stop users from signing up or something critical. I am very curious how far the third option is possible, since it theoretically allows to reduce the downtime of the reading function, since I do not need to load a large dump.

Has anyone done something like this? I would appreciate excellent alternatives if they are there or any feedback on how to improve them and choose 1 or 3. Thanks in advance :)

Alternative 1
1) Set globalsettings_booleans_writeable to 0
2) Download current DB (SQL dump)
3) Import downloaded DB locally
4) Run converter (in update mode) on local database
5) Export local DB
6) Set globalsettings_booleans_readable to 0
7) Import exported DB online
8) Set globalsettings_booleans_readable to 1
9) Set globalsettings_booleans_writeable to 1

Alternative 2
1) Set globalsettings_booleans_writeable to 0
2) Set globalsettings_booleans_readable to 0
3) Run converter (in update mode) on live database
4) Set globalsettings_booleans_readable to 1
5) Set globalsettings_booleans_writeable to 1

Alternative 3
1) Set globalsettings_booleans_writeable to 0
2) Create a remote copy of the database
3) Run converter (in update mode) on remote copy
4) Set globalsettings_booleans_readable to 0
5) Replace remote original with remote copy (easy?)
6) Set globalsettings_booleans_readable to 1
7) Set globalsettings_booleans_writeable to 1

+4
source share
1 answer

It seems to me that exceptional opportunity can be avoided by examining CSV to see which records will actually lead to a database change. It looks like the CSV generator is the actual data source, and the database is just a mirror, right?

If so, CSV records that do not result in changes can be ignored, d / b tables are not truncated, and the rest of the CSV records can be started using alternative 2, which is supposed to take only a few minutes.

The main weakness of this approach is that the records are deleted in the source, and there is no indication that d / b should delete them locally.

+1
source

All Articles