How to remove completely duplicate lines

Question

How to remove completely duplicate lines

Let's say I have duplicate rows in my table, and my database design has 3rd class: -

Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Lux','cosmetic soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Crowning Glory','cosmetic soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (2,'Cinthol','nice soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap'); Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');

I want only 1 instance of each row in my table. Thus, 2nd, 3rd and last row , which are completely identical, must be removed. What query can I write for this? Can this be done without creating temporary tables? Only in one request?

Thanks in advance:)

+7

sql sql-server tsql duplicate-removal sql-server-2008

TCM Jul 27 '10 at 15:32

source share

4 answers

 DELETE tblProduct FROM tblProduct LEFT OUTER JOIN ( SELECT MIN(ProductId) as ProductId, ProductName, Description, Category FROM tblProduct GROUP BY ProductName, Description, Category ) as KeepRows ON tblProduct.ProductId= KeepRows.ProductId WHERE KeepRows.ProductId IS NULL

Stolen from How to remove duplicate lines?

UPDATE:

This will only work if ProductId is the Primary Key (which is not). You are better off using the @marc_s method, but I will leave this in case someone uses PK, gets to this post.

+4

Abe miessler Jul 27 '10 at 15:40

source share

I needed to do this a few weeks ago ... what version of SQL Server are you using? In SQL Server 2005 and above, you can use Row_Number as part of your selection and choose only where Row_Number is 1. I forget the exact syntax, but it is well documented ... something in accordance with:

 Select t0.ProductID, t0.ProductName, t0.Description, t0.Category Into tblCleanData From ( Select ProductID, ProductName, Description, Category, Row_Number() Over ( Partition By ProductID, ProductName, Description, Category Order By ProductID, ProductName, Description, Category ) As RowNumber From MyTable ) As t0 Where t0.RowNumber = 1

Check http://msdn.microsoft.com/en-us/library/ms186734.aspx , which should make you move in the right direction.

+1

Benalabaster Jul 27 '10 at 15:44

source share

First use SELECT... INTO :

 SELECT DISTINCT ProductID, ProductName, Description, Category INTO tblProductClean FROM tblProduct

Drop the first table.

0

eykanal Jul 27 '10 at 15:37

source share

marc_s · Accepted Answer · 2010-07-27T15:51:55+0000

Try this - it will remove all duplicates from your table:

 ;WITH duplicates AS ( SELECT ProductID, ProductName, Description, Category, ROW_NUMBER() OVER (PARTITION BY ProductID, ProductName ORDER BY ProductID) 'RowNum' FROM dbo.tblProduct ) DELETE FROM duplicates WHERE RowNum > 1 GO SELECT * FROM dbo.tblProduct GO

Your duplicates should disappear now: conclusion:

 ProductID ProductName DESCRIPTION Category 1 Cinthol cosmetic soap soap 1 Lux cosmetic soap soap 1 Crowning Glory cosmetic soap soap 2 Cinthol nice soap soap 3 Lux nice soap soap

How to remove completely duplicate lines

More articles: