Puzzle: the distribution of numbers evenly across groups

It is rather a puzzle. He was probably asked elsewhere before, but I could not find anything, so I decided to share this question.

I am trying to implement some load balancing in the application and reduced the problem to what, in my opinion, should be a simple TSQL exercise (the application mainly belongs to the SQL Server domain (SQL Server 2008 R2).

Basically, I have a table with two integers; unique, consistent identifier and unique value. A table can contain any number of records, and I would like to create a data table in which the first n largest values ​​are divided into separate "groups", and then the second set of n largest values ​​is divided into separate "groups".

I have a first project that works below, but I believe that it can be improved ...

DECLARE @GroupCount INT = 5 -- Set up the test data DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT) INSERT @Test (Value) VALUES (100), (456), (121), (402), (253), (872), (765), (6529), (1029), (342), (98), (1), (0), (4), (46), (23), (456), (416), (2323), (4579) --Order by Value descending ;WITH cte AS ( SELECT * ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum FROM @Test ) --use modulus to split into grouping , cte2 AS ( SELECT * ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum DESC) Rnk FROM cte ) SELECT ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) AS 'Grouping' ,Value ,Id FROM cte2 ORDER BY [Grouping], Value ASC 

This works and creates the following dataset:

 Grouping, Value, Id ======== ===== == 1 46 15 1 342 10 1 765 7 1 6529 8 2 23 16 2 253 5 2 456 2 2 4579 20 3 4 14 3 121 3 3 456 17 3 2323 19 4 1 12 4 100 1 4 416 18 4 1029 9 5 0 13 5 98 11 5 402 4 5 872 6 

The returned data set is true in that the first n largest values ​​are divided into separate groups and so on, but the common values ​​in each group differ from each other compared to group 5 (for example).

When grouped and SUMmed, we can see an odd spread:

 Grouping, SummedValues ======== ============ 1 7682 2 5311 3 2904 4 1546 5 1372 

As few lines as possible, how can I better balance the values ​​so that the total values ​​in each group are more evenly distributed?

+7
sql sql-server tsql sql-server-2008-r2
source share
5 answers

This is false, but not scary for the example data. Your mileage may vary.

 declare @groupcount int = 5; create table t (id int identity(1, 1), value int); insert t values (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342) , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579); ;with cte as ( select * , rn = row_number() over (order by value asc) , pct = value/sum(value+.0) over() , target = 1.0 / @groupcount from t ) , remaining as ( select id, value, rn , grp = convert(int,(sum(value) over (order by rn)/sum(value+.0) over())*@groupCount)+1 from cte ) select grp = row_number() over (order by sum(value) desc) , sumValue = sum(value) from remaining group by grp 

Demo version of rexter: http://rextester.com/UNV61100

results:

 +-----+----------+ | grp | sumValue | +-----+----------+ | 1 | 6529 | | 2 | 4579 | | 3 | 3483 | | 4 | 2323 | | 5 | 1901 | +-----+----------+ 

<h / "> Compatible version of Sql Server 2008:

 declare @groupcount int = 5; create table t (id int identity(1, 1), value int); insert t values (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342) , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579); ;with cte as ( select * , rn = row_number() over (order by value asc) , pct = value/tv.TotalValue , target = 1.0 / @groupcount from t cross join (select TotalValue = sum(value+.0) from t) tv ) , remaining as ( select id, value, rn , grp = convert(int,((x.sumValueOver/TotalValue)*@groupcount)+1) from cte outer apply ( select sumValueOver = sum(value) from cte i where i.rn <= cte.rn ) x ) select grp = row_number() over (order by sum(value) desc) , sumValue = sum(value) from remaining group by grp 

registry: http://rextester.com/DEUDJ77007

returns:

 +-----+----------+ | grp | sumValue | +-----+----------+ | 1 | 6529 | | 2 | 4579 | | 3 | 3483 | | 4 | 2323 | | 5 | 1901 | +-----+----------+ 
+1
source share

Here the NTILE function in sql server can help you.

 DECLARE @GroupCount INT = 5 -- Set up the test data DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT) INSERT @Test (Value) SELECT 100 UNION ALL SELECT 456 UNION ALL SELECT 121 UNION ALL SELECT 402 UNION ALL SELECT 253 UNION ALL SELECT 872 UNION ALL SELECT 765 UNION ALL SELECT 6529 UNION ALL SELECT 1029 UNION ALL SELECT 342 UNION ALL SELECT 98 UNION ALL SELECT 1 UNION ALL SELECT 0 UNION ALL SELECT 4 UNION ALL SELECT 46 UNION ALL SELECT 23 UNION ALL SELECT 456 UNION ALL SELECT 416 UNION ALL SELECT 2323 UNION ALL SELECT 4579 ;With cte AS ( SELECT *, NTILE(@GroupCount) OVER(ORDER BY Value DESC) AS GroupNo FROM @Test ) SELECT GroupNo, SUM(Value) AS SummedValues FROM cte GROUP BY GroupNo 

and I get this result.

 GroupNo SummedValues -------------------- 1 14460 2 2549 3 1413 4 365 5 28 
+1
source share

A slightly better way to do this is with snakes of choice. You build the 1st, 6th, 11th maximum - of course, thus, above the 5th, 10th, 15th.

Better would be 1, 10, 11, 5, 6, 15. Still not perfect, and your data is still very bad, but a little better than yours.

 DECLARE @GroupCount INT = 5 -- Set up the test data DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT) INSERT @Test (Value) SELECT 100 UNION ALL SELECT 456 UNION ALL SELECT 121 UNION ALL SELECT 402 UNION ALL SELECT 253 UNION ALL SELECT 872 UNION ALL SELECT 765 UNION ALL SELECT 6529 UNION ALL SELECT 1029 UNION ALL SELECT 342 UNION ALL SELECT 98 UNION ALL SELECT 1 UNION ALL SELECT 0 UNION ALL SELECT 4 UNION ALL SELECT 46 UNION ALL SELECT 23 UNION ALL SELECT 456 UNION ALL SELECT 416 UNION ALL SELECT 2323 UNION ALL SELECT 4579 --Order by Value descending ;WITH cte AS ( SELECT * ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum FROM @Test ) --use modulus to split into grouping , cte2 AS ( SELECT * ,ROW_NUMBER() OVER (PARTITION BY RowNum % (@GroupCount*2 ) ORDER BY RowNum DESC) Rnk FROM cte ) select [Grouping], SUM(value) from ( SELECT floor(abs(@GroupCount - (ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) - 0.5)) + 0.5) AS 'Grouping' ,Value ,Id FROM cte2 --ORDER BY [Grouping], Value ASC ) a group by [Grouping] order by [Grouping] ASC 

Ultimately, although I think that a random assignment is probably better than this, perhaps a random assignment, checking that the sum is no longer 2 * (1 / grouping * of total).

Indeed, I think this is not a problem well resolved by TSQL or any SQL; languages ​​that can control flow line by line will serve you better. Python, C #, SAS, any other tool that is in your toolbox. (PL / SQL is one place I would consider here ...)

Everything that allows you to say at the line level: β€œHaving tracked what I have assigned so far, assign this particular case to the bucket with the lowest number so far” will actually work better.

 Grouping Summed Values --------------------- 1 1781 2 1608 3 2904 4 5249 5 7273 
+1
source share

Using the ntile and row_number allows not only to split it into even groups (even by account, but not by total), but it is better to decide what values ​​to include in each group to align everything in each group as much as possible.

Answer:

 select case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end as [grouping] , b.value , b.id from ( select a.id , a.value , a.grp_split , row_number() over (partition by a.grp_split order by a.value desc) grp_split_rnk_desc , row_number() over (partition by a.grp_split order by a.value asc) grp_split_rnk_asc from ( select t.id , t.value , ntile(@ntile_cnt) over (order by t.value desc) as grp_split from @test as t ) as a ) as b order by case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end asc , b.value asc 

Results:

Not perfect, but a little closer.

 Group Total 1 7029 2 5096 3 2904 4 1761 5 2025 
+1
source share

The primary result is determined by the first largest values. So you can try to arrange everything else in reverse order

 WITH cte AS ( SELECT * ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum FROM @Test ) --use modulus to split into grouping , cte2 AS ( SELECT * ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum ) Rnk FROM cte ) ,cte3 AS (SELECT ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY case rnk when 1 then Value else -Value end DESC) AS [Grouping] ,Value ,Id FROM cte2 ) select [Grouping],sum(value) from cte3 group by [Grouping] order by [Grouping]; 

Result

  Grouping (No column name) 1 1 7029 2 2 5096 3 3 2904 4 4 1761 5 5 2025 
0
source share

All Articles