Delete duplicate record from table - SQL query

Question

Delete duplicate record from table - SQL query

I need to remove duplicate rows only from the table, for example, I have 3 duplicate rows in the table, my query will delete 2 rows from 3 duplicate rows.

How can i get this? Please help me.

+6

sql database sql-server sql-delete

Santanu Nov 17 '09 at 12:14

source share

7 answers

This works in SQL Server, although this is not a single statement:

 Declare @cnt int; Select @cnt=COUNT(*) From DupTable Where (Col1=1); -- Assumes you are trying to delete the duplicates where some condition (eg Col1=1) is true. Delete Top (@cnt-1) From DupTable

It also does not require any additional assumptions (for example, the existence of another column that makes each row unique). In the end, Santana said the rows were duplicates, not just one column.

However, the right answer, in my opinion, is to get the actual structure of the table. That is, add an IDENTITY column to this table so that you can use one SQL command to do your job. Like this:

 ALTER TABLE dbo.DupTable ADD IDCol int NOT NULL IDENTITY (1, 1) GO

Then removal is trivial:

 DELETE FROM DupTable WHERE IDCol NOT IN (SELECT MAX(IDCol) FROM DupTable GROUP BY Col1, Col2, Col3)

+4

Mark brittingham Nov 17 '09 at 13:55

source share

 DELETE FROM Table t1, Table t2 WHERE t1.colDup = t2.colDup AND t1.date < t2.date

Removes every duplicate row from Table (in the colDup column), except the oldest (e.g. lowset date ).

+3

jensgram Nov 17 '09 at 12:21

source share

 DELETE FROM `mytbl` INNER JOIN ( SELECT 1 FROM `mytbl` GROUP BY `duplicated_column` HAVING COUNT(*)=2 ) USING(`id`)

Edit:

My bad, the above request will not work.

Assuming table structure:

id int auto_increment

num int # <is a column with duplicate values

The following query will work in MySQL (I checked):

 DELETE `mytbl` FROM `mytbl` INNER JOIN ( SELECT `num` FROM `mytbl` GROUP BY `num` HAVING COUNT(*)=2 ) AS `tmp` USING (`num`)

The query will delete rows with 2 (no more) duplicate values in the num column.

Edit (again):

I suggest adding a key to the num column.

Edit (# 3):

If the author wanted to remove duplicate rows, for MySQL (this worked for me), the following should work:

 DELETE `delete_duplicated_rows` FROM `delete_duplicated_rows` NATURAL JOIN ( SELECT * FROM `delete_duplicated_rows` GROUP BY `num1` HAVING COUNT(*)=2 ) AS `der`

Provided that the structure of the table is as follows:

 CREATE TABLE `delete_duplicated_rows` ( `num1` tinyint(4) DEFAULT NOT NULL, `num2` tinyint(4) DEFAULT NOT NULL ) ENGINE=MyISAM;

+2

Dor Nov 17 '09 at 12:21

source share

If you have the id of the rows you want to delete, then ...

 DELETE FROM table WHERE id IN (1, 4, 7, [id numbers to delete...])

+1

user110714 Nov 17 '09 at 12:20

source share

I think each table has a unique identifier. Therefore, if it exists, you can write the following query: Delete table1 from table1 t1, where 2> = (select count (id) from table 1, where dupColumn = t1.dupColumn) and t1.id are not in (select max (id ) from table 1, where dupColumn = t1.dupColumn)

OOps It seems that only the second filter can be used Delete table1 from table 1 t1, where t1.id is not in (select max (id) from table 1, where dupColumn = t1.dupColumn)

+1

Danil Nov 17 '09 at 13:00

source share

  -- Just to demonstrates Marks example . -- START === 1.0.dbo..DuplicatesTable.TableCreate.sql /****** Object: Table [dbo].[DuplicatesTable] Script Date: 03/29/2010 21:24:02 ******/ IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U')) DROP TABLE [dbo].[DuplicatesTable] GO /****** Object: Table [dbo].[DuplicatesTable] Script Date: 03/29/2010 21:24:02 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[DuplicatesTable]( [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable [ColB] [varchar](10) NULL, -- the description of the e DuplicatesTable ) /* <doc> Models a DuplicatesTable for </doc> */ GO --============================================================ DuplicatesTable START declare @ScriptFileName varchar(2000) SELECT @ScriptFileName = '$(ScriptFileName)' SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' declare @TableName varchar(200) select @TableName = 'DuplicatesTable' SELECT 'SELECT name from sys.tables where name =''' + @TableName + '''' SELECT name from sys.tables where name = @TableName DECLARE @TableCount INT SELECT @TableCount = COUNT(name ) from sys.tables where name =@TableName if @TableCount=1 SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' ELSE SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' --============================================================ DuplicatesTable END GO -- END === 1.0.dbo..DuplicatesTable.TableCreate.sql . -- START === 1.1..dbo..DuplicatesTable.TableInsert.sql BEGIN TRANSACTION; INSERT INTO [dbo].[DuplicatesTable]([ColA], [ColB]) SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA', N'ColB' UNION ALL SELECT N'ColA1', N'ColB1' UNION ALL SELECT N'ColA1', N'ColB1' UNION ALL SELECT N'ColA1', N'ColB1' UNION ALL SELECT N'ColA1', N'ColB1' UNION ALL SELECT N'ColA1', N'ColB1' UNION ALL SELECT N'ColA1', N'ColB1' UNION ALL SELECT N'ColA1', N'ColB1' COMMIT; RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT; GO -- END === 1.1..dbo..DuplicatesTable.TableInsert.sql . -- START === 2.0.RemoveDuplicates.Script.sql ALTER TABLE dbo.DuplicatesTable ADD DuplicatesTableId int NOT NULL IDENTITY (1, 1) GO -- Then the delete is trivial: DELETE FROM dbo.DuplicatesTable WHERE DuplicatesTableId NOT IN (SELECT MAX(DuplicatesTableId) FROM dbo.DuplicatesTable GROUP BY ColA , ColB) Select * from DuplicatesTable ; -- END === 2.0.RemoveDuplicates.Script.sql

+1

Yordan georgiev Sep 29 '10 at 14:48

source share

Muhammad akhtar · Accepted Answer · 2009-11-17T13:11:08+0000

Please try the query below, it will definitely satisfy your purpose.

SET ROWCOUNT 1 DELETE test FROM test a WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1 WHILE @@rowcount > 0 DELETE test FROM test a WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1 SET ROWCOUNT 0

where test is the name of your table

Delete duplicate record from table - SQL query

More articles: