Using only 3 timestamps for a bitemporal SQL database is it possible?

When implementing bitemporal databases in SQL, it is generally recommended to use the following labels:

  • Validstart
  • Validend
  • TransactionStart
  • Transactionaction

I used this approach several times earlier, but I always wondered why having only three timestamps, leaving TransactionEnd, is not only the right implementation. Here, the transaction time interval spans the TransactionStart until the next TransactionStart.

Are there strong arguments not only for using three timestamps that limit the size of the database?

+5
source share
2 answers

As mentioned in the commentary for simplicity, as it is somewhat more difficult to do without it.

Consider the following example. John was born in some place, Location1 , in January of the first 1990, but was first registered to be born on the fifth.

Persons database table now looks like this:

 +----------+--------------+------------+----------+------------+----------+ | Name | Location | valid_from | valid_to | trans_from | trans_to | +----------+--------------+------------+----------+------------+----------+ | John | Location1 | 01-01-1990 |99-99-9999| 05/01/1990 |99-99-9999| +----------+--------------+------------+----------+------------+----------+ 

Removing the trans_to column trans_to this point will not cause too many problems, but suppose the following:

In a few years, say, 20, John move to Location2 and inform officials in 20 days. This will make the Persons table look like this:

 +----------+--------------+------------+----------+------------+----------+ | Name | Location | valid_from | valid_to | trans_from | trans_to | +----------+--------------+------------+----------+------------+----------+ | John | Location1 | 01-01-1990 |99-99-9999| 05/01/1990 |20-01-2010| | John | Location1 | 01-01-1990 |01-01-2010| 20/01/2010 |99-99-9999| | John | Location2 | 01-01-2010 |99-99-9999| 20/01/2010 |99-99-9999| +----------+--------------+------------+----------+------------+----------+ 

Suppose someone wanted to know "Where the system thinks John is living now" (transaction time), regardless of where he actually lives. This can be (roughly) requested in SQL as follows

 Select Location From Persons Where Name = John AND trans_from > NOW AND trans_to < NOW 

Assume transaction completion time has been deleted

 +----------+--------------+------------+----------+------------+ | Name | Location | valid_from | valid_to | trans_from | +----------+--------------+------------+----------+------------+ | John | Location1 | 01-01-1990 |99-99-9999| 05/01/1990 | | John | Location1 | 01-01-1990 |01-01-2010| 20/01/2010 | | John | Location2 | 01-01-2010 |99-99-9999| 20/01/2010 | +----------+--------------+------------+----------+------------+ 

The above query, of course, is no longer valid, but making the logic for the same query in the last table will be somewhat more difficult. Since trans_to absent, it must be obtained from other rows in the table. For example, the implicit trans_to time for the first line (starting from the oldest record) is trans_from from the second line, which is the newest of the two.

Thus, the transaction end time is 9999-99-99 if the line is the newest, or trans_from from the line that immediately succeeds in it.

This means that the data related to a particular line is not completely stored in this line, and the lines form a dependence on each other, which (of course) is undesirable. In addition, it is quite difficult to determine which string is the immediate successor to the string, which can make queries even more complex.

+3
source

An example of using only one timestamp instead of two in a 1D time database:

I have a store and I want to record when user X was in my store.

If I use a model with start and end times, this information can be written as

 X,1,2 X,3,4 

therefore, user X was in my store between 1 and 2 and between 3 and 4. This is clear, simple and concise.

If I model my data only from the start time as a timestamp, I will have:

 X,1 X,2 X,3 X,4 

but how can I interpret this data? X from (1,2) and X from (3,4)? or X from (2,3) and X from (1,4)? or X from (1,2), (2,3), (3,4)? X from (4, inf) really?

To understand this data, I need to add additional restrictions / logic / information to my data or code: maybe the intervals do not overlap, maybe I will add an identifier to the object, etc. All these solutions do not work in all cases; they are difficult to support and other problems.

For example, if I add id (a, b in this case) to each element, this will result in:

 X,a,1 X,a,2 X,b,3 X,b,4 

instead, save my data in 2 rows, 3 columns my data will be stored in 4 rows, 3 columns. I not only do not use this model, but this model can be reduced to:

 X,a, 1,2 X,b, 3,4 

further reduced to

 X, 1,2 X, 3,4 
+1
source

All Articles