Firstly, I do not see a problem when describing these tables of measurements and facts outside the warehouse :)
From the point of view of conceptualization and understanding of relationships, I personally see that the use of start / end dates is very easy for people to understand. Resolution of agent fact tables and locations, and then time-dependent binding tables, such as Agent_At_Location, etc. However, they have problems that deserve attention.
If EndDate is 2008-08-30 , was an employee in this place until August 30 or UP TO and including August 30.
Working with overlapping date periods in queries can produce erratic queries, but more importantly, slow queries.
The first seems to be just a convention issue, but it can have certain consequences when working with other data. For example, consider that EndDate from 2008-08-30 means that they are in this place until and on August 30th. Then you join their agent’s daily data for that day (for example, when they actually arrived at work, went for breaks, etc.). You need to join ON AgentDailyData.EventTimeStamp < '2008-08-30' + 1 to enable all events that occurred on that day.
This is because EventTimeStamp data is not measured in days, but probably minutes or seconds.
If you think that EndDate from '2008-08-30' means that the Agent was in this UP location, but DOES NOT INCLUDE on August 30, + 1 not required for the connection. In fact, you do not need to know if the date is related to the day or whether it may include a time component or not. You just need a TimeStamp < EndDate .
Using EXCLUSIVE End markers, all your queries are simplified and never need + 1 day or + 1 hour to solve boundary conditions.
The second solution is much more complicated. The easiest way to resolve the overlap period is as follows:
SELECT CASE WHEN TableA.InclusiveFrom > TableB.InclusiveFrom THEN TableA.InclusiveFrom ELSE TableB.InclusiveFrom END AS [NetInclusiveFrom], CASE WHEN TableA.ExclusiveFrom < TableB.ExclusiveFrom THEN TableA.ExclusiveFrom ELSE TableB.ExclusiveFrom END AS [NetExclusiveFrom], FROM TableA INNER JOIN TableB ON TableA.InclusiveFrom < TableB.ExclusiveFrom AND TableA.ExclusiveFrom > TableB.InclusiveFrom
The problem with this query is related to indexing. The first condition is TableA.InclusiveFrom < TableB.ExclusiveFrom can be resolved using an index. But it can give a massive date range. And then, for each of these records, ExclusiveDate can be almost anything, and, of course, not in order, which can help quickly solve TableA.ExclusiveFrom > TableB.InclusiveFrom
The solution I previously used for this is to have the maximum allowable gap between InclusiveFrom and ExclusiveFrom . This allows something like ...
ON TableA.InclusiveFrom < TableB.ExclusiveFrom AND TableA.InclusiveFrom >= TableB.InclusiveFrom - 30 AND TableA.ExclusiveFrom > TableB.InclusiveFrom
Condition TableA.ExclusiveFrom > TableB.InclusiveFrom STILL cannot use indexes. But instead, we limited the number of rows that can be returned by doing a TableA.InclusiveFrom search. This is not more than 30 days, because we know that we have limited the duration to 30 days.
An example of this is the breakdown of associations by calendar month (maximum duration is 31 days).
EmployeeId | LocationId | EffectiveDate | EndDate 1 | 2 | 2007-04-01 | 2008-05-01 1 | 2 | 2007-05-01 | 2008-06-01 1 | 2 | 2007-06-01 | 2008-06-25 (Representing Employee 1 being in Location 2 from 1st April to (but not including) 25th June.)
This is an effective compromise; using disk space to improve performance.
I even saw that this was taken to the extreme, without actually preserving date ranges, but preserving the actual display for each day. Essentially, he would like to limit the maximum duration to 1 day ...
EmployeeId | LocationId | EffectiveDate 1 | 2 | 2007-06-23 1 | 2 | 2007-06-24 1 | 3 | 2007-06-25 1 | 3 | 2007-06-26
Instinctively, I initially rebelled against this. But in subsequent ETLs, Warehousing, Reporting, etc. I really found it very powerful, adaptable and supported. I actually saw people make fewer mistakes when coding, writing code in less time, the code ended faster, and it was much more able to adapt to changing customer needs.
The only two downsides were:
1. More disk space (but trivial compared to the size of the fact table)
2. Inserts and updates for this mapping were slower
Actual slowdown for investments and updates really matters Once upon a time, when this model was used to represent an ever-changing network of processes; where the application wanted to change the display approximately 30 times per second. Even then, it worked, it just turned up more CPU time than it was perfect.