How to do the opposite "row_number () over (section on [Col] order by [Col])"

Question

How to do the opposite "row_number () over (section on [Col] order by [Col])"

I am trying to combine duplicate entries in a data table and give them a new number.

Here is an example dataset ( runnable copy )

declare @tmpTable table (ID Varchar(1), First varchar(4), Last varchar(5), Phone varchar(13), NonKeyField varchar(4)) insert into @tmpTable select 'A', 'John', 'Smith', '(555)555-1234', 'ASDF' insert into @tmpTable select 'B', 'John', 'Smith', '(555)555-1234', 'GHJK' insert into @tmpTable select 'C', 'Jane', 'Smith', '(555)555-1234', 'QWER' insert into @tmpTable select 'D', 'John', 'Smith', '(555)555-1234', 'RTYU' insert into @tmpTable select 'E', 'Bill', 'Blake', '(555)555-0000', 'BVNM' insert into @tmpTable select 'F', 'Bill', 'Blake', '(555)555-0000', '%^&*' insert into @tmpTable select 'G', 'John', 'Smith', '(555)555-1234', '!#RF' select row_number() over (partition by First, Last, Phone order by ID) NewIDNum, * from @tmpTable order by ID

Now he gives me the results

 NewIDNum ID First Last Phone NonKeyField -------------------- ---- ----- ----- ------------- ----------- 1 A John Smith (555)555-1234 ASDF 2 B John Smith (555)555-1234 GHJK 1 C Jane Smith (555)555-1234 QWER 3 D John Smith (555)555-1234 RTYU 1 E Bill Blake (555)555-0000 BVNM 2 F Bill Blake (555)555-0000 %^&* 4 G John Smith (555)555-1234 !#RF

However, this is the opposite of what I want, NewIDNum resets the counter every time it finds a new key combination. I want all the same combinations to have the same identifier. Therefore, if he behaved the way I wanted, I would get the following results

 NewIDNum ID First Last Phone NonKeyField -------------------- ---- ----- ----- ------------- ----------- 1 A John Smith (555)555-1234 ASDF 1 B John Smith (555)555-1234 GHJK 2 C Jane Smith (555)555-1234 QWER 1 D John Smith (555)555-1234 RTYU 3 E Bill Blake (555)555-0000 BVNM 3 F Bill Blake (555)555-0000 %^&* 1 G John Smith (555)555-1234 !#RF

What is the right way to get the results I want?

I did not include this requirement in the original message : I need NewIDNum to create the same numbers in subsequent runs of this query for existing rows if more rows are added (assuming all new rows will have a higher identifier “value” if the order is executed in Identifier column)

So, if at the last date the following was done

 insert into @tmpTable select 'H', 'John', 'Smith', '(555)555-1234', '4321' insert into @tmpTable select 'I', 'Jake', 'Jons', '(555)555-1234', '1234' insert into @tmpTable select 'J', 'John', 'Smith', '(555)555-1234', '2345'

running the correct query again will give

 NewIDNum ID First Last Phone NonKeyField -------------------- ---- ----- ----- ------------- ----------- 1 A John Smith (555)555-1234 ASDF 1 B John Smith (555)555-1234 GHJK 2 C Jane Smith (555)555-1234 QWER 1 D John Smith (555)555-1234 RTYU 3 E Bill Blake (555)555-0000 BVNM 3 F Bill Blake (555)555-0000 %^&* 1 G John Smith (555)555-1234 !#RF 1 H John Smith (555)555-1234 4321 4 I Jake Jons (555)555-1234 1234 1 J John Smith (555)555-1234 2345

+7

sql sql-server sql-server-2005 row-number

Scott Chamberlain Oct 2 '12 at 18:12

source share

4 answers

Based on @Andomar's original answer - this will work on your updated requirements (although it is unlikely to scale beautifully)

 select DENSE_RANK() over (ORDER BY IdRank, First, Last, Phone) AS NewIDNum, ID, First, Last, Phone, NonKeyField from ( select MIN(ID) OVER (PARTITION BY First, Last, Phone) as IdRank, * from @tmpTable ) as x order by ID;

+1

etliens Oct 2 '12 at 18:36

source share

Thanks Andomar answer as a jumping point, I myself decided it

 select sub1.rn, tt.* from @tmpTable tt inner join ( select row_number() over (order by min(ID)) as rn, first, last, phone from @tmpTable group by first, last, phone ) as sub1 on tt.first = sub1.first and tt.last = sub1.last and tt.phone = sub1.phone

this gives

 rn ID First Last Phone NonKeyField -------------------- ---- ----- ----- ------------- ----------- 1 A John Smith (555)555-1234 ASDF 1 B John Smith (555)555-1234 GHJK 1 D John Smith (555)555-1234 RTYU 1 G John Smith (555)555-1234 !#RF 1 H John Smith (555)555-1234 4321 1 J John Smith (555)555-1234 2345 2 C Jane Smith (555)555-1234 QWER 3 E Bill Blake (555)555-0000 BVNM 3 F Bill Blake (555)555-0000 %^&* 4 I Jake Jons (555)555-1234 1234

Considering the SQL execution plan, Adnomar's answer will be faster for large datasets than mine. (53% of the run-time VS 47% of the run-time when starting next to each other and "Include the actual execution plan".

0

Scott Chamberlain Oct 2 '12 at 18:42

source share

This should work

 select dense_rank() over (order by First, Last, Phone) NewIDNum, * from @tmpTable order by ID

-one

iruvar Oct 2 '12 at 18:20

source share

Andomar · Accepted Answer · 2012-10-02T18:15:44+0000

You can use dense_rank() :

 dense_rank() over (order by First, Last, Phone) as NewIDNum

In response to your comment, you can sort the minimum of the old Id column for each row group with the same combination (First, Last, Phone) :

 select * from ( select dense_rank() over (order by min_id) as new_id , * from ( select min(id) over ( partition by First, Last, Phone) as min_id , * from @tmpTable ) as sub1 ) as sub3 order by new_id

How to do the opposite "row_number () over (section on [Col] order by [Col])"

More articles: