Best practice for version control of data in SQL databases

Question

Best practice for version control of data in SQL databases

My entire database occasionally has entries that are incorrect, but instead of directly modifying the data, I would like to save the changes to the changes.

These changes occur very rarely.

Ideally, something like this: -

(original table fields) | revision_version | origin | user | timestamp

So to speak, I had a table called posts with the following schema: -

 title | description | timestamp | author

An additional table will be created called posts_revisions : -

 title | description | timestamp | author | revision_version | origin | user | timestamp

origin , the source of the change, be it a bot, a user created or having you.

As you can imagine, this is a pretty big change in the existing database, my current problem is that you check the _revisions tables for each query. Is it best practice for this kind of thing?

+4

sql mysql postgresql

martin blank Aug 2 '12 at 12:51

source share

2 answers

If tables are used for short purposes (especially for business users, if they have some SQL access), I think it’s better to delete the data and put it in another table. Although flags and corrections are sometimes fine when you need to do something along the lines of select sum(select someVar where revision_version=max(revision_version and someID=ID)) , then it really goes beyond the simple.

If you have a table that is used for quick and unpleasant data collection, replace the data and, if necessary, put the old data in the revision table. If only some application will access it. And this is not a performance issue, and then save it in the main table.

+1

Fluffeh Aug 2 '12 at 12:57

source share

Gordon linoff · Accepted Answer · 2012-08-02T13:52:28+0000

For this type of problem, I save the current table and the history table.

The history table has the following additional columns:

HistoryID
EffectiveDate
Endate
VersionNumber
Createdby
Createdat

Valid and ending dates are the time interval in which the values are valid. The version simply increases with every change to the record. Identifier, CreateAt and CreatedBy are the columns that I put in almost every table in the database.

Typically, I keep the history table up-to-date with night tasks that compare tables, and then use MERGE to combine the data. An alternative is to transfer all changes to stored procedures and update both tables. Another alternative is to use triggers that detect when a change occurs. However, I shy away from triggers, preferring the first two alternatives.

I must admit that disk space is not a big consideration for these tables. Thus, there are no problems with storing data twice, once in the results once in the history. It would be just a minor tweak to keep only the history in the history table, with the current entries in the "current" table.

One of the disadvantages of this approach is the change in the structure of the base table. If you want to add a column, you need to add it to the history table, as well as to the base table.

Best practice for version control of data in SQL databases

More articles: