How can I normalize the capitalization of a group column?

Question

How can I normalize the capitalization of a group column?

In SQL Server configured as case-insensitive, group by can have interesting results when the [n][var]char column is not the first group by column. Essentially, it looks like any line it encounters “first” (where “first” is undefined in the absence of order): wins for this grouping. For instance:

 select x.[day], x.[name], count(1) as [count] from ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' ) x group by x.[day], x.[name]

which returns for me:

 day name count ----------- ---- ----------- 1 A 2 2 a 2

Using min(x.[name]) has no effect since the grouping has already occurred.

I cannot add order by in front of group by , as this is illegal; and adding order by after group by simply determines the output order after grouping - it still gives a and a .

So: is there a reasonable way to do this when the capitalization is at least agreed upon for all groups? (I will leave the matching problem for individual runs the next day)

Required Conclusion:

 day name count ----------- ---- ----------- 1 A 2 2 a 2

or

 day name count ----------- ---- ----------- 1 A 2 2 a 2

Edit: without capitalization, if agreed between groups. So no top / bottom. Therefore, if one of the groups sequentially has the value BcDeF , I want the result of this line to be BcDeF , not BcDeF or BcDeF .

+6

sql sql-server case-insensitive group-by case-sensitive

Marc gravell Sep 22 '16 at 11:09

source share

4 answers

Use upper() or lower() :

 select x.[day], lower(x.[name]) as name, count(1) as [count] from ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' ) x group by x.[day], x.[name];

You are correct that SQL Server selects a value from an undefined row. min() and max() do not help, since the values are equivalent. The easiest solution is to explicitly choose the case you need.

+2

Gordon linoff Sep 22 '16 at 11:12

source share

Use case-insensitive sorting in Group by , for example:

 select day, name, count(*) from tablename group by day, name collate SQL_Latin1_General_Cp1_CI_AS_KI_WI

Perhaps SQL Server has problems? Using other dbms, it runs like:

 SQL>create table t (d int, name varchar(10)); SQL>insert into t values (1,'A'); SQL>insert into t values (2,'A'); SQL>insert into t values (2,'a'); SQL>insert into t values (3,'BcDeF'); SQL>insert into t values (3,'bCdEf'); SQL>insert into t values (4,'a'); SQL>select d, name, count(*) SQL&from t SQL&group by d, name collate english_1; d name =========== ========== ==================== 1 A 1 2 A 2 3 BcDeF 2 4 a 1 4 rows found

Where english_1 is case insensitive sorting.

As expected?

+2

jarlh Sep 22 '16 at 11:12

source share

You can use UPPER in a GROUP BY to transfer all values to the same capitalization.

0

przemo_li Sep 22 '16 at 11:12

source share

Lamak · Accepted Answer · 2016-09-22T13:03:16+0000

I would use windows functions for this. Using ROW_NUMBER and partitioning using case-insensitive sorting, but ordering case-sensitive sorting, we will sequentially select one result with initial capitalization, but it will group them as if they were the same:

 WITH CTE AS ( SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY [day], [name] ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS), N = COUNT(*) OVER(PARTITION BY [day], [name]) FROM ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' union all select 3, 'BcDeF' union all select 3, 'bCdEf') X ) SELECT * FROM CTE WHERE RN = 1;

It returns:

 ╔═════╦═══════╦════╦═══╗ ║ day ║ name ║ RN ║ N ║ ╠═════╬═══════╬════╬═══╣ ║ 1 ║ A ║ 1 ║ 2 ║ ║ 2 ║ A ║ 1 ║ 2 ║ ║ 3 ║ BcDeF ║ 1 ║ 2 ║ ╚═════╩═══════╩════╩═══╝

Following @AndriyM's comment, if you want the same capitalization across the entire result set, and not just on the same day, you can use:

 WITH CTE AS ( SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY [day], [name] ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS), N = COUNT(*) OVER(PARTITION BY [day], [name]) FROM ( select 1 as [day], 'a' as [name] union all select 1, 'A' union all select 2, 'A' union all select 2, 'a' union all select 3, 'BcDeF' union all select 3, 'bCdEf') X ) SELECT [day], MAX([name] COLLATE SQL_Latin1_General_Cp1_CS_AS) OVER (PARTITION BY [name]) [name], N FROM CTE WHERE RN = 1;

How can I normalize the capitalization of a group column?

More articles: