SQL Server: a group question that annoys me

I have been working with SQL Server for most of the decade, and this grouping (or partitioning, or ranking ... I'm not sure what the answer is!), I have a dead end. It feels like this should be easy too. I will summarize my problem:

Let's say I have 3 employees (don’t worry about them leaving or something else ... there are always 3 there) and I am not lagging behind the way I distribute their salaries every month.

Month Employee PercentOfTotal -------------------------------- 1 Alice 25% 1 Barbara 65% 1 Claire 10% 2 Alice 25% 2 Barbara 50% 2 Claire 25% 3 Alice 25% 3 Barbara 65% 3 Claire 10% 

As you can see, I paid them the same percentage in months 1 and 3, but in Month 2 I gave Alice the same 25%, but Barbara had 50% and Claire 25%.

What I want to know is all the clear distributions I have ever given. In this case, there will be two - one for months 1 and 3 and one for month 2.

I expect the results to look something like this (NOTE: identifier or sequencer or something else doesn't matter)

 ID Employee PercentOfTotal -------------------------------- X Alice 25% X Barbara 65% X Claire 10% Y Alice 25% Y Barbara 50% Y Claire 25% 

It seems easy, right? I'm at a dead end! Does anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I wonder if there is a better way. Or maybe another way from which I learn something.

 WITH temp_ids (Month) AS ( SELECT DISTINCT MIN(Month) FROM employees_paid GROUP BY PercentOfTotal ) SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal FROM employees_paid EMP JOIN temp_ids IDS ON EMP.Month = IDS.Month GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal 

Thanks! -Ricky

+4
source share
5 answers

I assume the performance will not be big (reason for the subquery)

 SELECT * FROM employees_paid where Month not in ( SELECT a.Month FROM employees_paid a INNER JOIN employees_paid b ON (a.employee = B.employee AND a.PercentOfTotal = b.PercentOfTotal AND a.Month > b.Month) GROUP BY a.Month, b.Month HAVING Count(*) = (SELECT COUNT(*) FROM employees_paid c where c.Month = a.Month) ) 
  • An internal SELECT performs self-learning to identify matching combinations of employees and percentages (other than those for the same month). "> The JOIN guarantees that only one set of matches will be executed, that is, if the entry Month1 entry = Month3, we get only a combination of entries Month3-Month1 instead of month 1 month 3, month 3 month and month 3 month.
  • Then we GROUP by COUNT matching records for each combination of the month of the month
  • Then HAVING excludes months that do not have many matches that have month entries.
  • The external SELECT receives all records except those returned by the internal query (those that correspond to the complete set)
+2
source

This gives you the answer in a slightly different format than you asked:

 SELECT DISTINCT T1.PercentOfTotal AS Alice, T2.PercentOfTotal AS Barbara, T3.PercentOfTotal AS Claire FROM employees_paid T1 JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara' JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire' 

Result:

 Alice Barbara Claire 25% 50% 25% 25% 65% 10% 

If you want, you can use UNPIVOT to get this result set in the requested form.

 SELECT rn AS ID, Employee, PercentOfTotal FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn FROM ( SELECT DISTINCT T1.PercentOfTotal AS Alice, T2.PercentOfTotal AS Barbara, T3.PercentOfTotal AS Claire FROM employees_paid T1 JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara' JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire' ) T1 ) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt 

Result:

 ID Employee PercentOfTotal 1 Alice 25% 1 Barbara 50% 1 Claire 25% 2 Alice 25% 2 Barbara 65% 2 Claire 10% 
+4
source

What you want is the distribution of each month as a signature or value template that you would like to find in other months. It is unclear whether the employee to whom importance was given is as important as the breakdown of interest. For example, Alice = 65%, Barbara = 25%, Claire = 10% will be the same as month 3 in your example? In my example, I assumed that this would not be the same. Like Martin Smith’s decision, I find signatures by multiplying each percent by 10. This assumes that all percentages are less than one. If someone could have a percentage of 110%, for example, this would create problems for this solution.

 With Employees As ( Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal Union All Select 1, 'Barbara', .65 Union All Select 1, 'Claire', .10 Union All Select 2, 'Alice', .25 Union All Select 2, 'Barbara', .50 Union All Select 2, 'Claire', .25 Union All Select 3, 'Alice', .25 Union All Select 3, 'Barbara', .65 Union All Select 3, 'Claire', .10 ) , EmployeeRanks As ( Select Month, Employee, PercentOfTotal , Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank From Employees ) , Signatures As ( Select Month , Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue From EmployeeRanks Group By Month ) , DistinctSignatures As ( Select Min(Month) As MinMonth, SignatureValue From Signatures Group By SignatureValue ) Select E.Month, E.Employee, E.PercentOfTotal From Employees As E Join DistinctSignatures As D On D.MinMonth = E.Month 
+3
source

If you understand correctly, then for a general solution, I think you need to combine the whole group together - for example, for the production of Alice:0.25, Barbara:0.50, Claire:0.25 . Then select individual groups to do something like the following (pretty clunkily).

 WITH EmpSalaries AS ( SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL SELECT 3 AS Month, 'Alice' AS Employee, 0.25 UNION ALL SELECT 3 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL SELECT 3 AS Month, 'Claire' AS Employee, 0.10 ), Months AS ( SELECT DISTINCT Month FROM EmpSalaries ), MonthlySummary AS ( SELECT Month, Stuff( ( Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20)) From EmpSalaries As S1 Where S1.Month = Months.Month Order By S1.Employee For Xml Path('') ), 1, 2, '') As Summary FROM Months ) SELECT * FROM EmpSalaries WHERE Month IN (SELECT MIN(Month) FROM MonthlySummary GROUP BY Summary) 
+2
source

I just put together this solution when writing this question, which seems to work

I do not think this works. Here I added two more groups (month = 4 and 5, respectively), which I would consider excellent, but the result is the same as month = 1 and 2:

 WITH employees_paid (Month, Employee, PercentOfTotal) AS ( SELECT 1, 'Alice', 0.25 UNION ALL SELECT 1, 'Barbara', 0.65 UNION ALL SELECT 1, 'Claire', 0.1 UNION ALL SELECT 2, 'Alice', 0.25 UNION ALL SELECT 2, 'Barbara', 0.5 UNION ALL SELECT 2, 'Claire', 0.25 UNION ALL SELECT 3, 'Alice', 0.25 UNION ALL SELECT 3, 'Barbara', 0.65 UNION ALL SELECT 3, 'Claire', 0.1 UNION ALL SELECT 4, 'Barbara', 0.25 UNION ALL SELECT 4, 'Claire', 0.65 UNION ALL SELECT 4, 'Alice', 0.1 UNION ALL SELECT 5, 'Diana', 0.25 UNION ALL SELECT 5, 'Emma', 0.65 UNION ALL SELECT 5, 'Fiona', 0.1 ), temp_ids (Month) AS ( SELECT DISTINCT MIN(Month) FROM employees_paid GROUP BY PercentOfTotal ) SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal FROM employees_paid AS EMP INNER JOIN temp_ids AS IDS ON EMP.Month = IDS.Month GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal; 
+2
source

Source: https://habr.com/ru/post/1312805/


All Articles