Check for duplicate data in SQL Server

Please do not ask me why, but there is a lot of duplicated data where each field is duplicated.

for instance

alex, 1 alex, 1 liza, 32 hary, 34 

I will need to exclude from this table one of the rows alex, 1

I know that this algorithm will be very inefficient, but it does not matter. I will need to delete duplicate data.

What is the best way to do this? Please keep in mind that I do not have two fields, I actually have about 10 fields to check.

+4
source share
5 answers

Method A. You can get a released version of your data using

 SELECT field1, field2, ... INTO Deduped FROM Source GROUP BY field1, field2, ... 

For example, for your sample data,

 SELECT name, number FROM Source GROUP BY name, number 

gives

 alex 1 hary 34 liza 32 

then just delete the old table and rename the new one. Of course, there are a number of fantastic solutions in place, but this is the clearest way to do this.

Method B. The in-place method is to create a primary key and remove duplicates in this way. For example, you can

 ALTER TABLE Source ADD sid INT IDENTITY(1,1); 

what makes Source look like this

 alex 1 1 alex 1 2 liza 32 3 hary 34 4 

then you can use

 DELETE FROM Source WHERE sid NOT IN (SELECT MIN(sid) FROM Source GROUP BY name, number) 

which will give the desired result. Of course, "NOT IN" is not entirely effective, but it will do the job. Alternatively, you can HELP a JOINT of a grouped table (possibly stored in a TEMP table) and do this DELETE this way.

+2
source

As you said, yes, it will be very inefficient, but you can try something like

 DECLARE @TestTable TABLE( Name VARCHAR(20), SomeVal INT ) INSERT INTO @TestTable SELECT 'alex', 1 INSERT INTO @TestTable SELECT 'alex', 1 INSERT INTO @TestTable SELECT 'liza', 32 INSERT INTO @TestTable SELECT 'hary', 34 SELECT * FROM @TestTable ;WITH DuplicateVals AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Name, SomeVal ORDER BY (SELECT NULL)) RowID FROM @TestTable ) DELETE FROM DuplicateVals WHERE RowID > 1 SELECT * FROM @TestTable 
+6
source

I understand that this does not answer a specific question (eliminating the hype in the SAME table), but I offer a solution because it is very fast and can work best for the author.

Speedy solution , if you do not mind creating a new table, create a new table with the same layout as NewTable.

Run this sql

  Insert into NewTable Select name, num from OldTable group by name, num 

Just include each field name in the select and group by clauses.

+3
source
 create table DuplicateTable(name varchar(10), number int) insert DuplicateTable values ('alex', 1), ('alex', 1), ('liza', 32), ('hary', 34); with cte as ( select *, row_number() over(partition by name, number order by name) RowNumber from DuplicateTable ) delete cte where RowNumber > 1 
+2
source

A slightly different solution that requires a primary key (or a unique index): Suppose you have your_table(id - PK, name, and num)

 DELETE FROM your_table FROM your_table AS t2 WHERE (select COUNT(*) FROM your_table y where t2.name = y.name and t2.num = y.num) >1 AND t2.id != (SELECT top 1 id FROM your_table z WHERE t2.name = z.name and t2.num = z.num); 

I suggested that the name and number are NOT NULL , if they can contain NULL values, you need to change where in the subqueries.

+2
source

All Articles