Database design with change history

I am looking for a database design that tracks every set of changes so I can come back to them in the future. For example:

Database A +==========+========+==========+ | ID | Name | Property | 1 Kyle 30 

If I change the "property" field of the row to 50, it should update the row to:

 1 Kyle 50 

But you should save the fact that the row property was 30 at a certain point in time. Then, if the line is updated again to 70:

 1 Kyle 70 

Both facts that the string property were 50 and 70 should be preserved, so with some query I could get:

 1 Kyle 30 1 Kyle 50 

Admittedly, these were “the same notes” only at different points in time.

Edit: this story should be presented to the user at some point in time, so ideally there should be an understanding of which lines belong to the same “revision cluster”

What is the best approach to designing this database?

+12
source share
4 answers

One way is to have a MyTableNameHistory for each table in your database and make its schema identical to the MyTableName table MyTableName , except that there is another column in the main key of the history table called effectiveUtc as DateTime. For example, if you have a table named Employee ,

 Create Table Employee { employeeId integer Primary Key Not Null, firstName varChar(20) null, lastName varChar(30) Not null, HireDate smallDateTime null, DepartmentId integer null } 

Then the History table will be

 Create Table EmployeeHistory { employeeId integer Not Null, effectiveUtc DateTime Not Null, firstName varChar(20) null, lastName varChar(30) Not null, HireDate smallDateTime null, DepartmentId integer null, Primary Key (employeeId , effectiveUtc) } 

You can then put the trigger in the Employee table so that every time you insert, update or delete something in the Employee table, a new record is inserted into the EmployeeHistory table with the same values ​​for all regular fields, and the current time and UTC in the effectiveUtc column .

Then, to find values ​​at any point in the past, you simply select an entry from the history table whose effectiveUtc value is the highest value before asOf datetime you want to get the value with.

  Select * from EmployeeHistory h Where EmployeeId = @EmployeeId And effectiveUtc = (Select Max(effectiveUtc) From EmployeeHistory Where EmployeeId = h.EmployeeId And effcetiveUtc < @AsOfUtcDate) 
+13
source

To add Charles's answer , I would use the Entity-Attribute-Value model instead of creating another history table for every other table in your database.

Essentially, you would create one History table as follows:

 Create Table History { tableId varChar(64) Not Null, recordId varChar(64) Not Null, changedAttribute varChar(64) Not Null, newValue varChar(64) Not Null, effectiveUtc DateTime Not Null, Primary Key (tableId , recordId , changedAttribute, effectiveUtc) } 

You will then create a History record each time you create or modify data in one of your tables.

Following your example, when you add “Kyle” to the Employee table, you create two records (one for each attribute without an identifier), and then create a new record each time the property changes:

 History +==========+==========+==================+==========+==============+ | tableId | recordId | changedAttribute | newValue | effectiveUtc | | Employee | 1 | Name | Kyle | N | | Employee | 1 | Property | 30 | N | | Employee | 1 | Property | 50 | N+1 | | Employee | 1 | Property | 70 | N+2 | 

Alternatively, as suggested by a_horse_with_no_name , if you do not want to save a new History record for each field change, you can save the grouped changes (such as changing the Name to "Kyle" and Property to 30 in the same update) as a single record , In this case, you will need to express a collection of changes in JSON or another format of BLOB objects. This will combine the changedAttribute and newValue into one ( changedValues ). For example:

 History +==========+==========+================================+==============+ | tableId | recordId | changedValues | effectiveUtc | | Employee | 1 | { Name: 'Kyle', Property: 30 } | N | 

This may be more complicated than creating a history table for every other table in your database, but it has several advantages:

  • adding new fields to tables in your database will not require adding these fields to another table
  • fewer tables used
  • It’s easier to relate updates to different tables over time.

One architectural advantage of this design is that you share the challenges of your application with your story / audit capabilities. This design will work just as well as a microservice using a relational database or even a NoSQL database that is separate from your application database.

+3
source

The best way depends on what you do. You want to take a deeper look at slowly changing sizes:

https://en.wikipedia.org/wiki/Slowly_changing_dimension

In Postgres 9.2, don't miss the tsrange type either. It allows you to combine start_date and end_date into one column and index the material with a GIST (or GIN) index along with an exception restriction to avoid overlapping date ranges.


Edit:

there must be an understanding of which rows belong to the same "revision cluster"

In this case, you need date ranges one way or another in your tables, not version numbers or live flags, otherwise you end up duplicating related data everywhere.

In a separate note, consider the possibility of extracting audit tables from live data, rather than saving them to just one table. It’s more difficult to implement and manage, but it provides much more efficient queries on current data.


See also this related post: Temporary database design with a twist (live string translation)

+1
source

One way to record all changes is to create so-called audit triggers . Such triggers can write any changes to the table in which they are located, in a separate log table (which you can query to view the history of changes).

Details of the implementation here .

+1
source

All Articles