DELETE all recurring topics with multiple conditions

I am trying to make sql that will remove all duplicate headers, but should remove duplicates with these conditions:

  • should remove only duplicates with the same object_id
  • should contain only the newest entry (the largest topic_id ) (topic_id is a unique identifier for each AI topic)

So far I have done this (testing with select ...)

SELECT topic_id,object_id,title,url,date FROM topics GROUP BY title HAVING ( COUNT(title) > 1) ORDER BY topic_id DESC 

But does not meet the conditions. I am using mysql.

+4
source share
4 answers

In MySQL you cannot specify a target table for a DML operation in a subquery (unless you have nested it more than one level deep, but in this case you will not get reliable results and you cannot use correlated subqueries).

Use JOIN :

 DELETE td FROM topics td JOIN topics ti ON ti.object_id = td.object_id AND ti.title = td.title AND ti.topic_id > td.topic_id; 

Create an index on topics (object_id, title, topic_id) so that it works quickly.

+5
source

This will delete all duplicate object_id entries that keep one with the highest topic_id.

 delete from topics outer where exists ( select 1 from topics inner where outer.object_id = inner.object_id AND inner.topic_id < outer.topic_id ) 
+1
source

First, if you had a date field, you'd better identify the latest entries by their date.

This will work:

 SELECT topic_id, object_id, title, url, date FROM topics earlier WHERE EXISTS (SELECT newest.topic_id FROM topics newest WHERE newest.date > earlier.date AND newest.object_id = earlier.object_id) 

You select rows for which another row exists with the same object_id and a later date.

0
source

WITH tbl AS (SELECT topic_id, object_id, row_number () over (section on object_id on topic_id_ID DESC) as rnum
FROM) DELETE tbl WHERE rnum> 1

For more information, please check this article: http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/

-1
source

All Articles