Delete duplicate rows based on single column value

I have a table below, and now I need to delete rows that have duplicate "refIDs" but have at least one row with this ref, i.e. I need to delete lines 4 and 5. please help me with this

+----+-------+--------+--+ | ID | refID | data | | +----+-------+--------+--+ | 1 | 1023 | aaaaaa | | | 2 | 1024 | bbbbbb | | | 3 | 1025 | cccccc | | | 4 | 1023 | ffffff | | | 5 | 1023 | gggggg | | | 6 | 1022 | rrrrrr | | +----+-------+--------+--+ 
+5
source share
3 answers

This is similar to a Gordon Linoff query, but without a subquery:

 DELETE t1 FROM table t1 JOIN table t2 ON t2.refID = t1.refID AND t2.ID < t1.ID 

This uses an inner join to remove only rows that have another row with the same id but with a lower id.

The advantage of avoiding the subquery is the ability to use the index for searches. This query should work well with a multi-column index on refID +.

+5
source

In MySQL, you can do this with join in delete :

 delete t from table t left join (select min(id) as id from table t group by refId ) tokeep on t.id = tokeep.id where tokeep.id is null; 

For each RefId subquery calculates the minimum of the id column (it is assumed that it is unique throughout the table). It uses left join to match, so anything that doesn't match is NULL for tokeep.id . These are the ones that are deleted.

+1
source

I would do:

 delete from t where ID not in (select min(ID) from table t group by refID having count(*) > 1) and refID in (select refID from table t group by refID having count(*) > 1) 

refId criteria are duplicates, and the identifier is different from the minimum (id) of the duplicates. This will work better if refId is indexed.

otherwise, and provided that you can issue the next request several times until it removes anything

 delete from t where ID in (select max(ID) from table t group by refID having count(*) > 1) 
+1
source

Source: https://habr.com/ru/post/1213055/


All Articles