Best practice for version control of data in SQL databases

My entire database occasionally has entries that are incorrect, but instead of directly modifying the data, I would like to save the changes to the changes.

These changes occur very rarely.

Ideally, something like this: -

(original table fields) | revision_version | origin | user | timestamp 

So to speak, I had a table called posts with the following schema: -

 title | description | timestamp | author 

An additional table will be created called posts_revisions : -

 title | description | timestamp | author | revision_version | origin | user | timestamp 
  • origin , the source of the change, be it a bot, a user created or having you.

As you can imagine, this is a pretty big change in the existing database, my current problem is that you check the _revisions tables for each query. Is it best practice for this kind of thing?

+4
source share
2 answers

For this type of problem, I save the current table and the history table.

The history table has the following additional columns:

  • HistoryID
  • EffectiveDate
  • Endate
  • VersionNumber
  • Createdby
  • Createdat

Valid and ending dates are the time interval in which the values ​​are valid. The version simply increases with every change to the record. Identifier, CreateAt and CreatedBy are the columns that I put in almost every table in the database.

Typically, I keep the history table up-to-date with night tasks that compare tables, and then use MERGE to combine the data. An alternative is to transfer all changes to stored procedures and update both tables. Another alternative is to use triggers that detect when a change occurs. However, I shy away from triggers, preferring the first two alternatives.

I must admit that disk space is not a big consideration for these tables. Thus, there are no problems with storing data twice, once in the results once in the history. It would be just a minor tweak to keep only the history in the history table, with the current entries in the "current" table.

One of the disadvantages of this approach is the change in the structure of the base table. If you want to add a column, you need to add it to the history table, as well as to the base table.

+2
source

If tables are used for short purposes (especially for business users, if they have some SQL access), I think it’s better to delete the data and put it in another table. Although flags and corrections are sometimes fine when you need to do something along the lines of select sum(select someVar where revision_version=max(revision_version and someID=ID)) , then it really goes beyond the simple.

If you have a table that is used for quick and unpleasant data collection, replace the data and, if necessary, put the old data in the revision table. If only some application will access it. And this is not a performance issue, and then save it in the main table.

+1
source

All Articles