How to find rows with equal columns?

If I have a table with important 2 columns,

CREATE TABLE foo (id INT, a INT, b INT, KEY a, KEY b); 

How can I find all lines in which both a and b match on both lines? For example, in this dataset

 id | a | b ---------- 1 | 1 | 2 2 | 5 | 42 3 | 1 | 42 4 | 1 | 2 5 | 1 | 2 6 | 1 | 42 

I want to return all rows except id=2 , since it is unique in (a,b) . Basically, I want to find all offensive lines that would stop

 ALTER TABLE foo ADD UNIQUE (a, b); 

Something better than an n ^ 2 loop for a loop would be nice, since my table has 10M rows.

For bonus points : how to delete everything except one of the lines (I do not care as long as they remain)

+4
source share
8 answers
 SELECT * FROM foo first JOIN foo second ON ( first.a = second.a AND first.b = second.b ) AND (first.id <> second.id ) 

All lines should appear where more than one line has the same combination of a and b.

Hope you have an index in columns a and b.

+1
source
 select * from foo where a = b 

Or am I missing something?

===

Update for clarity:

 select * from foo as a inner join foo as b on aa = ba AND ba = bb and a.id != b.id 

+++++++++++ After the 3rd change in definition:

 select f1.id FROM foo as f1 INNER JOIN foo as f2 ON f1.a = f2.a AND f1.b=f2.b AND f1.id != f2.id 

But they shot me, so check for yourself.

+1
source

shouldn't work?

 SELECT * FROM foo WHERE a = b 

=== edit ===

What about

 SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1 

=== final re-editing before I discard this question ===

 SELECT foo.* FROM foo, ( SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1 ) foo2 WHERE foo.a = foo2.a AND foo.b = foo2.b 
+1
source

Could you clarify what you need to do in the long run? The best solution may depend on this (for example, just want to delete all rows with duplicate keys?)

One way is to process this table (not sure if mySQL supports it, it is from SYBASE) if all you need is a row with a unique key:

 SELECT MIN(id), A, B FROM FOO GROUP BY A, B HAVING COUNT(*)>1 

Your exact question (although I do not understand a bit why you need all the lines except id = 2):

 SELECT F1.* FROM FOO F1 , (SELECT A, B FROM FOO GROUP BY A, B HAVING COUNT(*)>1) F2 WHERE F1.A=F2.A and F1.B=F2.B 

To remove all duplicates, you can, for example, do

 DELETE FOO WHERE NOT EXISTS (SELECT 1 from (SELECT MIN(id) 'min_id' FROM FOO GROUP BY A, B HAVING COUNT(*)>1) UINIQUE_IDS WHERE id = min_id) 

Alternatively you can do

  SELECT MIN(id) 'id', A, B INTO TEMPDB..NEW_TABLE FROM FOO GROUP BY A, B HAVING COUNT(*)>1 TRUNCATE TABLE FOO // Drop indices on FOO INSERT FOO SELECT * FROM NEW_TABLE // Recreate indices on FOO 
+1
source

Try the following:

  With s as (Select a,b from foo group by a,b having Count(1)>1) Select foo.* from foo,s where foo.a=sa and foo.b=sb 

This query should contain duplicate rows in the foo table.

+1
source

here is a different approach

  select * from foo f1 where exists (
   select * from foo f2 where
     f1.id! = f2.id and
     f1.a = f2.a and
     f1.b = f2.b)

In any case, although I consider it more readable, if you have such a huge table, you should check the execution plan, subqueries have a bad reputation for performance ...

you should also consider creating an index (without a clear sentence, obviously) in order to speed up the query ... for huge operations, sometimes it is better to spend time creating the index, updating and then discarding the index ... in this case, I think the index on (a, b) certainly should help a lot ...

0
source

Your stated goal is to remove all duplicate combination (a,b) . For this you can use multi-table DELETE:

 DELETE t1 FROM foo t1 JOIN foo t2 USING (a, b) WHERE t2.id > t1.id 

Before starting it, you can check which lines will be deleted with:

 SELECT DISTINCT t1.id FROM foo t1 JOIN foo t2 USING (a, b) WHERE t2.id > t1.id 

The WHERE clause equal to t2.id > t1.id will delete all but one with the highest value for id . In your case, only rows with id equal to 2, 5 or 6 will remain.

0
source

If the id value does not matter at all in the final product, that is, if you can number them all and everything will be fine, and if id is a sequential column, then just select "select separate" on two columns in a new table, delete all the data from the old table and then copy the temporary values.

0
source

All Articles