How to remove duplicate entries in a table?

I have a table in a test database that seems to get a little confused when I ran the INSERT scripts to set it up. The scheme is as follows:

ID UNIQUEIDENTIFIER TYPE_INT SMALLINT SYSTEM_VALUE SMALLINT NAME VARCHAR MAPPED_VALUE VARCHAR 

It should have dozens of lines. It has about 200,000, most of which are duplicates in which TYPE_INT, SYSTEM_VALUE, NAME and MAPPED_VALUE are identical, but the identifier is not.

Now I could make a script to clear this file, which creates a temporary table in memory, uses INSERT .. SELECT DISTINCT to capture all the unique values, TRUNCATE original table and then copy everything back. But is there an easier way to do this, like a DELETE query with something special in a WHERE ?

+2
sql sql-server-2008
source share
3 answers

You do not specify the name of your table, but I think something like this should work. Just leaving a record that has the lowest id. You can check with ROLLBACK first!

 BEGIN TRAN DELETE <table_name> FROM <table_name> T1 WHERE EXISTS( SELECT * FROM <table_name> T2 WHERE T1.TYPE_INT = T2.TYPE_INT AND T1.SYSTEM_VALUE = T2.SYSTEM_VALUE AND T1.NAME = T2.NAME AND T1.MAPPED_VALUE = T2.MAPPED_VALUE AND T2.ID > T1.ID ) SELECT * FROM <table_name> ROLLBACK 
+4
source share

here is a great article: Duplicate Removal , which mainly uses this template:

 WITH q AS ( SELECT d.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn FROM t_duplicate d ) DELETE FROM q WHERE rn > 1 SELECT * FROM t_duplicate 
+3
source share
 WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE ) AS ( SELECT Min(Id) ID TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE FROM T1 GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE HAVING Count(Id) > 1 ) DELETE FROM T1 WHERE ID IN ( SELECT T1.Id FROM T1 INNER JOIN Duplicates ON T1.TYPE_INT = Duplicates.TYPE_INT AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE AND T1.NAME = Duplicates.NAME AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE AND T1.Id <> Duplicates.ID ) 
+2
source share

All Articles