How to find duplicate records and remove the oldest from SQL?

I have a table in which there are unique rows, with the exception of one value in one column (let me call it "Name"). Another column is Date, which is the date it was added to the database.

What I want to do is find duplicate values ​​in "Name" and then delete those that have the oldest dates in "Date", leaving the most recent.

This seems to be a relatively simple query, but I know very little about SQL besides simple queries.

Any ideas?

+4
source share
3 answers

Find duplicates and delete the oldest

alt text

Here is the code

create table #Product ( ID int identity(1, 1) primary key, Name varchar(800), DateAdded datetime default getdate() ) insert #Product(Name) select 'Chocolate' insert #Product(Name,DateAdded) select 'Candy', GETDATE() + 1 insert #Product(Name,DateAdded) select 'Chocolate', GETDATE() + 5 select * from #Product ;with Ranked as ( select ID, dense_rank() over (partition by Name order by DateAdded desc) as DupeCount from #Product P ) delete R from Ranked R where R.DupeCount > 1 select * from #Product 
+5
source

delete from table a1 where exists (select * from table a2, where a2.name = a1.name and a2.date> a1.date)

+5
source

Perhaps you can achieve this with self-connect, and IS NOT NULL.

Joining DELETE queries can be a little dangerous, because the more complex, the greater the risk of deleting more than you intend in some cases.

But I would approach him like that.

 DELETE a.* FROM mytable AS a LEFT JOIN mytable AS b ON b.date > a.date AND (b.name=a.name OR (b.date = a.date AND b.rowid>a.rowid)) WHERE AND b.rowid IS NOT NULL 

The connection and IS NOT NULL detect every row for which there is a newer row with the same name. It also correctly handles the case of two rows with the same date - if they have the same date, then it follows rowid (no matter what it is).

Hope something like this works.

0
source

All Articles