Implement a database record hash to track if the record has changed or not

I have a database schema for an integration project in which I need to be able to query records that have changed, but only based on a given set of fields inside this record.

So, for example, here is an example table:

CLIENTS

  • ID
  • Name
  • Phone
  • Fax
  • Balance

I need to fulfill a request to select records whose name, phone or fax fields are changed. However, other fields should not be taken into account, i.e. If only the Balance field is changed, my request should not retrieve this record (thus, the timestamp field, which is updated automatically when the record changes, does not work).

In addition, this should be done on several different databases and platforms, so TRIGGERS or something like that is not really an option if they do not work in MySQL, PostgreSQL, SQL Server and SQLLite.

The fields are changed by a third-party application that I cannot change, so I can’t just add a flag, and the third-party application set the flag to TRUE when it changes the corresponding field.

My initial solution for this is to calculate the HASH of the respective fields and save it in a new β€œLastHash” field or something like that. Then I can calculate the hash of the corresponding fields for the data currently in the record, and if it does not match the saved LastHash, I know that it has changed.

It seems pretty dirty ... but it looks like it will work. Is there a better way? If not, is there a good way to implement this hash so that it is efficient and not too time consuming to retrieve these modified records?

EDIT

Some clarifications: both the application and the other application are updated and inserted into these tables. I can get my application to calculate the initial hash. However, I cannot get another application to compute it.

A timestamp column that automatically updates each time a record is changed can be easily replicated across all database systems using different types of columns or very simple triggers.

ADDITIONAL QUESTION

If hashing is the way to go ... is there any efficient hashing algorithm that won't forever count on all of these records? MD5 or SHA1 may work, but it looks like they will be available.

+6
database hash
source share
2 answers

That's cool. You still have to scan the tables (or index scan), since you need to calculate the new hash and compare it with the old hash.

If triggers are not possible due to cross-platform issues, you can force the database calculator to compute the current hash (i.e., a constant computed column is as efficient as a trigger). However, this is also a cross-platform issue! Then, if you index the current hash and your hash, this is a relatively simple search.

Can you at least use the timestamp field to reduce the number of hashes you need to check?

Another thing to keep in mind is that there is no such thing as a perfect hash function, so you might have false negatives (an inadvertent hash collision causes the change to not be detected). Is this (astronomically small) a risk worth spending?

+2
source share

I would standardize how your application checks the difference, not how this implements the database. Try something like using a view with a specific column that means change. Then use the appropriate tricks implemented in each database to make this view a reality. The code, which depends on checking this difference, will be the same using the same view and column.

0
source share

All Articles