How can I normalize the capitalization of a group column?

In SQL Server configured as case-insensitive, group by can have interesting results when the [n][var]char column is not the first group by column. Essentially, it looks like any line it encounters β€œfirst” (where β€œfirst” is undefined in the absence of order): wins for this grouping. For instance:

 select x.[day], x.[name], count(1) as [count] from ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' ) x group by x.[day], x.[name] 

which returns for me:

 day name count ----------- ---- ----------- 1 A 2 2 a 2 

Using min(x.[name]) has no effect since the grouping has already occurred.

I cannot add order by in front of group by , as this is illegal; and adding order by after group by simply determines the output order after grouping - it still gives a and a .

So: is there a reasonable way to do this when the capitalization is at least agreed upon for all groups? (I will leave the matching problem for individual runs the next day)

Required Conclusion:

 day name count ----------- ---- ----------- 1 A 2 2 a 2 

or

 day name count ----------- ---- ----------- 1 A 2 2 a 2 

Edit: without capitalization, if agreed between groups. So no top / bottom. Therefore, if one of the groups sequentially has the value BcDeF , I want the result of this line to be BcDeF , not BcDeF or BcDeF .

+6
source share
4 answers

I would use windows functions for this. Using ROW_NUMBER and partitioning using case-insensitive sorting, but ordering case-sensitive sorting, we will sequentially select one result with initial capitalization, but it will group them as if they were the same:

 WITH CTE AS ( SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY [day], [name] ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS), N = COUNT(*) OVER(PARTITION BY [day], [name]) FROM ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' union all select 3, 'BcDeF' union all select 3, 'bCdEf') X ) SELECT * FROM CTE WHERE RN = 1; 

It returns:

 ╔═════╦═══════╦════╦═══╗ β•‘ day β•‘ name β•‘ RN β•‘ N β•‘ ╠═════╬═══════╬════╬═══╣ β•‘ 1 β•‘ A β•‘ 1 β•‘ 2 β•‘ β•‘ 2 β•‘ A β•‘ 1 β•‘ 2 β•‘ β•‘ 3 β•‘ BcDeF β•‘ 1 β•‘ 2 β•‘ β•šβ•β•β•β•β•β•©β•β•β•β•β•β•β•β•©β•β•β•β•β•©β•β•β•β• 

Following @AndriyM's comment, if you want the same capitalization across the entire result set, and not just on the same day, you can use:

 WITH CTE AS ( SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY [day], [name] ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS), N = COUNT(*) OVER(PARTITION BY [day], [name]) FROM ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' union all select 3, 'BcDeF' union all select 3, 'bCdEf') X ) SELECT [day], MAX([name] COLLATE SQL_Latin1_General_Cp1_CS_AS) OVER (PARTITION BY [name]) [name], N FROM CTE WHERE RN = 1; 
+9
source

Use upper() or lower() :

 select x.[day], lower(x.[name]) as name, count(1) as [count] from ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' ) x group by x.[day], x.[name]; 

You are correct that SQL Server selects a value from an undefined row. min() and max() do not help, since the values ​​are equivalent. The easiest solution is to explicitly choose the case you need.

+2
source

Use case-insensitive sorting in Group by , for example:

 select day, name, count(*) from tablename group by day, name collate SQL_Latin1_General_Cp1_CI_AS_KI_WI 

Perhaps SQL Server has problems? Using other dbms, it runs like:

 SQL>create table t (d int, name varchar(10)); SQL>insert into t values (1,'A'); SQL>insert into t values (2,'A'); SQL>insert into t values (2,'a'); SQL>insert into t values (3,'BcDeF'); SQL>insert into t values (3,'bCdEf'); SQL>insert into t values (4,'a'); SQL>select d, name, count(*) SQL&from t SQL&group by d, name collate english_1; d name =========== ========== ==================== 1 A 1 2 A 2 3 BcDeF 2 4 a 1 4 rows found 

Where english_1 is case insensitive sorting.

As expected?

+2
source

You can use UPPER in a GROUP BY to transfer all values ​​to the same capitalization.

0
source

All Articles