SQL: the best way to build a timeline from two history tables

Given the following:

CREATE TABLE Members (MemberID INT) INSERT Members VALUES (1001) CREATE TABLE PCPs (PCPID INT) INSERT PCPs VALUES (231) INSERT PCPs VALUES (327) INSERT PCPs VALUES (390) CREATE TABLE Plans (PlanID INT) INSERT Plans VALUES (555) INSERT Plans VALUES (762) CREATE TABLE MemberPCP ( MemberID INT , PCP INT , StartDate DATETIME , EndDate DATETIME) INSERT MemberPCP VALUES (1001, 231, '2002-01-01', '2002-06-30') INSERT MemberPCP VALUES (1001, 327, '2002-07-01', '2003-05-31') INSERT MemberPCP VALUES (1001, 390, '2003-06-01', '2003-12-31') CREATE TABLE MemberPlans ( MemberID INT , PlanID INT , StartDate DATETIME , EndDate DATETIME) INSERT MemberPlans VALUES (1001, 555, '2002-01-01', '2003-03-31') INSERT MemberPlans VALUES (1001, 762, '2003-04-01', '2003-12-31') 

I am looking for a clean way to build a time reference for Member / PCP / Plan relationships, where a change in the PCP or plan for a member will result in a separate start / end line as a result. For example, if after a few years a member changed his PCP twice and their plan once, but each on different dates, I would see something like the following:

 MemberID PCP PlanID StartDate EndDate 1001 231 555 2002-01-01 2002-06-30 1001 327 555 2002-07-01 2003-03-31 1001 327 762 2003-04-01 2003-05-31 1001 390 762 2003-06-01 2003-12-31 

As you can see, I need a separate row of results for each date period, which is related to the difference in the Member / PCP / Plan association. I have a solution in place, but it is very confusing with the many CASE statements and conditional logic in the WHERE clause. I just think there is a much easier way to do this.

Thanks.

+7
source share
4 answers

My approach is to select a unique combination of start dates for each member as a starting point, and then collect the other parts of the query from it:

 -- -- Traverse down a list of -- unique Member ID and StartDates -- -- For each row find the most -- recent PCP for that member -- which started on or before -- the start date of the current -- row in the traversal -- -- For each row find the most -- recent PlanID for that member -- which started on or before -- the start date of the current -- row in the traversal -- -- For each row find the earliest -- end date for that member -- (from a collection of unique -- member end dates) that happened -- after the start date of the -- current row in the traversal -- SELECT MemberID, (SELECT TOP 1 PCP FROM MemberPCP WHERE MemberID = s.MemberID AND StartDate <= s.StartDate ORDER BY StartDate DESC ) AS PCP, (SELECT TOP 1 PlanID FROM MemberPlans WHERE MemberID = s.MemberID AND StartDate <= s.StartDate ORDER BY StartDate DESC ) AS PlanID, StartDate, (SELECT TOP 1 EndDate FROM ( SELECT MemberID, EndDate FROM MemberPlans UNION SELECT MemberID, EndDate FROM MemberPCP) e WHERE EndDate >= s.StartDate ORDER BY EndDate ) AS EndDate FROM ( SELECT MemberID, StartDate FROM MemberPlans UNION SELECT MemberID, Startdate FROM MemberPCP ) s ORDER BY StartDate 
0
source

T-SQL compatible. I agree with Glenn regarding a general approach.

Another suggestion: if you allow flights between periods in your business, this code needs further configuration. Otherwise, I think that delaying the EndDate value from the next StartDate will be better for more controlled behavior from your code. In this case, you must ensure the rule before the data gets to this request.

Edit: I just found out how, using the statement and SQL Fiddle from Andriy M.'s message, you can see my answer in SQL Fiddle .

Edit: Fixed a bug noted by Andrey.

 WITH StartDates AS ( SELECT MemberId, StartDate FROM MemberPCP UNION SELECT MemberId, StartDate FROM MemberPlans UNION SELECT MemberId, EndDate + 1 FROM MemberPCP UNION SELECT MemberId, EndDate + 1 FROM MemberPlans ), EndDates AS ( SELECT MemberId, EndDate = StartDate - 1 FROM MemberPCP UNION SELECT MemberId, StartDate - 1 FROM MemberPlans UNION SELECT MemberId, EndDate FROM MemberPCP UNION SELECT MemberId, EndDate FROM MemberPlans ), Periods AS ( SELECT s.MemberId, s.StartDate, EndDate = min(e.EndDate) FROM StartDates s INNER JOIN EndDates e ON s.StartDate <= e.EndDate AND s.MemberId = e.MemberId GROUP BY s.MemberId, s.StartDate ) SELECT MemberId = p.MemberId, pcp.PCP, pl.PlanId, p.StartDate, p.EndDate FROM Periods p LEFT JOIN MemberPCP pcp -- because of the way we divided period, -- there will be one and only one record that fits this join clause ON p.StartDate >= pcp.StartDate AND p.EndDate <= pcp.EndDate AND p.MemberId = pcp.MemberId LEFT JOIN MemberPlans pl ON p.StartDate >= pl.StartDate AND p.EndDate <= pl.EndDate AND p.MemberId = pl.MemberId ORDER BY p.MemberId, p.StartDate 
+1
source

As perhaps not the most effective, but at least a simple and simple solution, I would do the following:

  • 1) expand the ranges;

  • 2) join extended ranges;

  • 3) group the results.

This, of course, assumes that only dates are used (i.e. the 00:00 time part for all StartDate and EndDate in both tables).

To extend date ranges, I prefer to use the numbers table, for example:

 SELECT m.MemberID, m.PCP, Date = DATEADD(DAY, n.Number, m.StartDate) FROM MemberPCP m INNER JOIN Numbers n ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate) 

And similarly for MemberPlans .

To create a combined rowset, I would use a FULL JOIN , although if you know in advance that both tables span exactly the same period of time, INNER JOIN will do the same:

 SELECT * FROM MemberPCPExpanded pcp FULL JOIN MemberPlansExpanded plans ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date 

Now you only need to group the resulting rows and find the minimum and maximum dates for each combination (MemberID, PCP, PlanID) :

 SELECT MemberID = ISNULL(pcp.MemberID, plans.MemberID),, pcp.PCP, plans.PlanID, StartDate = MIN(ISNULL(pcp.Date, plans.Date)), EndDate = MAX(ISNULL(pcp.Date, plans.Date)) FROM MemberPCPExpanded pcp FULL JOIN MemberPlansExpanded plans ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date GROUP BY ISNULL(pcp.MemberID, plans.MemberID), pcp.PCP, plans.PlanID 

Please note that if you use INNER JOIN instead of FULL JOIN , you do not need all these ISNULL() expressions, just select a table column, for example. pcp.MemberID instead of ISNULL(pcp.MemberID, plans.MemberID) and pcp.Date instead of ISNULL(pcp.Date, plans.Date) .

A full query might look like this:

 WITH MemberPCPExpanded AS ( SELECT m.MemberID, m.PCP, Date = DATEADD(DAY, n.Number, m.StartDate) FROM MemberPCP m INNER JOIN Numbers n ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate) ), MemberPlansExpanded AS ( SELECT m.MemberID, m.PlanID, Date = DATEADD(DAY, n.Number, m.StartDate) FROM MemberPlans m INNER JOIN Numbers n ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate) ) SELECT MemberID = ISNULL(pcp.MemberID, plans.MemberID), pcp.PCP, plans.PlanID, StartDate = MIN(ISNULL(pcp.Date, plans.Date)), EndDate = MAX(ISNULL(pcp.Date, plans.Date)) FROM MemberPCPExpanded pcp FULL JOIN MemberPlansExpanded plans ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date GROUP BY ISNULL(pcp.MemberID, plans.MemberID), pcp.PCP, plans.PlanID ORDER BY MemberID, StartDate 

You can try this query in SQL Fiddle .

+1
source

Perhaps this will provide some ideas to get you started:

 SELECT y.memberid, y.pcp, z.planid, x.startdate, x.enddate FROM ( WITH startdates AS ( SELECT startdate FROM memberpcp UNION SELECT startdate FROM memberplans UNION SELECT enddate + 1 FROM memberpcp UNION SELECT enddate + 1 FROM memberplans ), enddates AS ( SELECT enddate FROM memberpcp UNION SELECT enddate FROM memberplans ) SELECT s.startdate, e.enddate FROM startdates s ,enddates e WHERE e.enddate = (SELECT MIN(enddate) FROM enddates WHERE enddate > s.startdate) ) x ,memberpcp y ,memberplans z WHERE (y.startdate, y.enddate) = (SELECT startdate, enddate FROM memberpcp WHERE startdate <= x.startdate AND enddate >= x.enddate) AND (z.startdate, z.enddate) = (SELECT startdate, enddate FROM memberplans WHERE startdate <= x.startdate AND enddate >= x.enddate) 

I came across Oracle with these results:

 1001 231 555 01-JAN-02 30-JUN-02 1001 327 555 01-JUL-02 31-MAR-03 1001 327 762 01-APR-03 31-MAY-03 1001 390 762 01-JUN-03 31-DEC-03 

The idea was to first define different date ranges. This is in article "C". Then search for each range in the other tables. Many assumptions here regarding overlapping ranges, etc. But perhaps this is the beginning. I tried looking at this without analytic functions, since there cannot be good support for analytic functions with tsql? I dont know. When constructing date ranges for a real range, ranges must also be created by the member.

0
source

All Articles