Joining tables in SQL with the SUM function

Question

Joining tables in SQL with the SUM function

I work with SQL Server 2012 and have the following tables: Ownership , Property , Person

The Person table contains information about people such as first name, last name, and this table contains << 24>.

The Property table contains information about a property, such as a property area, a description of the property .. and this table has a PropertyId as its primary key

Since each person can have more than one property, and each property of the property can be more than one person, then we have a many-to-many relationship between Person and Property

So, I created an Ownership table to break up this relationship, so this table has PersonId and PropertyId as foreign keys and the following columns: PropertyId as "Primary Key", StartDate , EndDate and OwnershipPercent .

Start Date and End Date refer to the period in which the property belongs to someone, and OwnershipPercent refers to the ownership share of the property.

Now I would like to write a request to return any property that belongs to more than one person more than 100% at the same time

For instance:

Real estate with Id=1 belongs to a person with No. 1 from 1-1-2010 to 1-1-2012, and its share in this property is 90%, and this property also belongs to another person with No. 2 from 1-1-2010 to 1-1-2012, and its share in this property is 80%. As we can see, we summarize 90 + 80 = 170% at the same time, and this is wrong (because it will be less than 100% at the same time)

I wrote the following query:

 SELECT A.PropertyId FROM Ownership A INNER JOIN Ownership B ON a.PersonId <> b.PersonId AND A.PropertyId = B.PropertyId AND A.StartDate <= B.EndDate AND A.EndDate >= B.StartDate group by A.PropertyId Having (sum(A.OwnershipPercent)) <=100;

but if we have a property that belongs to 5 people, it makes (5 × 4) = 20 amounts, and this is not true

How to fix it?

+4

sql inner-join sql-server-2012

Mohamad ghanem Jan 2 '13 at 12:25

source share

4 answers

Gordon linoff · Answer 1 · 2013-01-02T15:03:33+0000

I think the approach to joining in the ownership table is not quite right. I see what you are trying to do, but the union creates a pair of owners. Instead, you want to think about sets of owners.

My approach is to create a table with all the important dates for each property. This will be StartDate and EndDate in the OwnerShip table. Then, let's look at the percent property on these dates:

 select os.PropertyId, thedate, SUM(os.OwnershipPercent) from ((select PropertyId, StartDate as thedate from ownership )union (select PropertyId, EndDate from ownership ) ) driver join OwnerShip os on driver.PropertyId = os.PropertyId and driver.thedate between os.StartDate and os.EndDate group by os.PropertyId, thedate having SUM(os.OwnershipPercent) <= 100 -- Do you really want > 100 here?

One key difference is that this request is aggregated by PropertyId and date. This is reasonable because the amount of ownership can change over time.

Mari · Answer 2 · 2013-01-02T12:54:39+0000

DISTINCT will do the right thing

 SELECT A.PropertyId FROM Ownership A INNER JOIN Ownership B ON a.PersonId <> b.PersonId AND A.PropertyId = B.PropertyId AND A.StartDate <= B.EndDate AND A.EndDate >= B.StartDate group by A.PropertyId Having (sum(distinct A.OwnershipPercent)) <=100;

Aleksandr Fedorenko · Answer 3 · 2013-01-02T15:44:38+0000

This request is probably what you need

 SELECT PropertyID, FROM dbo.Ownership GROUP BY PropertyID, StartDate, EndDate HAVING COUNT(PersonID) > 1 AND SUM(OwnershipPercent) <= 100 --in your question you want > 100

Andriy m · Answer 4 · 2013-01-03T10:29:12+0000

The following is similar to @ Gordon Linoff ’s suggestion that it also “decomposes” the range list into a list of start and end dates. However, the result list uses a different method. He also assumes that only the start date is included, but the end date is not.

 WITH unpivoted AS ( SELECT PropertyId, EventDate, OwnershipPercent, PercentFactor = CASE EventDateType WHEN 'EndDate' THEN -1 ELSE 1 END FROM Ownership UNPIVOT ( EventDate FOR EventDateType IN (StartDate, EndDate) ) u ) , summedup AS ( SELECT DISTINCT PropertyId, EventDate, TotalPercent = SUM(OwnershipPercent * PercentFactor) OVER (PARTITION BY PropertyId ORDER BY EventDate) FROM unpivoted ) SELECT s.EventDate, s.TotalPercent, o.PropertyId, o.PersonId, o.StartDate, o.EndDate, o.OwnershipPercent FROM summedup s INNER JOIN Ownership o ON s.PropertyId = o.PropertyId AND s.EventDate >= o.StartDate AND s.EventDate < o.EndDate WHERE TotalPercent > 100 -- changed from the original "<= 100" -- based on the verbal description ;

To explain how this works, I will consider the content of Ownership as follows:

 PropertyId PersonId StartDate EndDate OwnershipPercent ---------- -------- ---------- ---------- ---------------- 1 1 2010-01-01 2012-01-01 80 1 2 2011-01-01 2011-03-01 20 1 3 2011-02-01 2011-04-01 10 1 4 2011-05-01 2011-07-01 40

Now you can see that in the first step, independent, not only each row of the source table is replaced by two rows, but also each percentage value is marked as an increment ( PercentFactor = 1 ), and decrement ( PercentFactor = -1 ), depending from whether it comes with a start date or with an end date. So, unpivoted CTE evaluates the following set of results:

 PropertyId EventDate OwnershipPercent PercentFactor ---------- ---------- ---------------- ------------- 1 2010-01-01 80 1 1 2011-01-01 20 1 1 2011-02-01 10 1 1 2011-03-01 20 -1 1 2011-04-01 10 -1 1 2011-05-01 40 1 1 2011-07-01 40 -1 1 2012-01-01 80 -1

At this point, the idea is to first calculate the current OwnershipPercent totals for each EventDate for each PropertyId , taking into account whether the value is increasing or decreasing. (In fact, you could include a sign on OwnershipPercent in the first step instead of highlighting a separate PercentFactor column. I chose the latter as a slightly better illustration of the idea, but there should be no penalty for performance if you prefer the first). And this is what you get after calculating the current totals (which is what the second CTE, summedup ):

 PropertyId EventDate TotalPercent ---------- ---------- ------------ 1 2010-01-01 80 1 2011-01-01 100 1 2011-02-01 110 1 2011-03-01 90 1 2011-04-01 80 1 2011-05-01 120 1 2011-07-01 80 1 2012-01-01 0

Note, however, that this result set may contain duplicate rows. In particular, this will happen if, for the same PropertyId some ranges start or end at the same time or some range ends exactly at the beginning of the date of another range. That is why you can see the DISTINCT used at this point.

Now that the total percentages on key dates are known, those that do not exceed 100 can simply be filtered out, and the rest join Ownership to access the details of the owners who contribute to the amounts received. So, the main query gives you this as the final result:

 EventDate TotalPercent PropertyId PersonId StartDate EndDate OwnershipPercent ---------- ------------ ---------- -------- ---------- ---------- ---------------- 2011-02-01 110 1 1 2010-01-01 2012-01-01 80 2011-02-01 110 1 2 2011-01-01 2011-03-01 20 2011-02-01 110 1 3 2011-02-01 2011-04-01 10 2011-05-01 120 1 1 2010-01-01 2012-01-01 80 2011-05-01 120 1 4 2011-05-01 2011-07-01 40

You can also see (as well as play with) this query in SQL Fiddle .

Joining tables in SQL with the SUM function

More articles: