How to find rows with equal columns?

Question

How to find rows with equal columns?

If I have a table with important 2 columns,

CREATE TABLE foo (id INT, a INT, b INT, KEY a, KEY b);

How can I find all lines in which both a and b match on both lines? For example, in this dataset

 id | a | b ---------- 1 | 1 | 2 2 | 5 | 42 3 | 1 | 42 4 | 1 | 2 5 | 1 | 2 6 | 1 | 42

I want to return all rows except id=2 , since it is unique in (a,b) . Basically, I want to find all offensive lines that would stop

 ALTER TABLE foo ADD UNIQUE (a, b);

Something better than an n ^ 2 loop for a loop would be nice, since my table has 10M rows.

For bonus points : how to delete everything except one of the lines (I do not care as long as they remain)

+4

sql mysql aggregate

Paul tarjan Sep 17 '09 at 4:46

source share

8 answers

 select * from foo where a = b

Or am I missing something?

===

Update for clarity:

 select * from foo as a inner join foo as b on aa = ba AND ba = bb and a.id != b.id

+++++++++++ After the 3rd change in definition:

 select f1.id FROM foo as f1 INNER JOIN foo as f2 ON f1.a = f2.a AND f1.b=f2.b AND f1.id != f2.id

But they shot me, so check for yourself.

+1

timdev Sep 17 '09 at 4:55

source share

shouldn't work?

 SELECT * FROM foo WHERE a = b

=== edit ===

What about

 SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1

=== final re-editing before I discard this question ===

 SELECT foo.* FROM foo, ( SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1 ) foo2 WHERE foo.a = foo2.a AND foo.b = foo2.b

+1

Lukman Sep 17 '09 at 4:56

source share

Could you clarify what you need to do in the long run? The best solution may depend on this (for example, just want to delete all rows with duplicate keys?)

One way is to process this table (not sure if mySQL supports it, it is from SYBASE) if all you need is a row with a unique key:

 SELECT MIN(id), A, B FROM FOO GROUP BY A, B HAVING COUNT(*)>1

Your exact question (although I do not understand a bit why you need all the lines except id = 2):

 SELECT F1.* FROM FOO F1 , (SELECT A, B FROM FOO GROUP BY A, B HAVING COUNT(*)>1) F2 WHERE F1.A=F2.A and F1.B=F2.B

To remove all duplicates, you can, for example, do

 DELETE FOO WHERE NOT EXISTS (SELECT 1 from (SELECT MIN(id) 'min_id' FROM FOO GROUP BY A, B HAVING COUNT(*)>1) UINIQUE_IDS WHERE id = min_id)

Alternatively you can do

  SELECT MIN(id) 'id', A, B INTO TEMPDB..NEW_TABLE FROM FOO GROUP BY A, B HAVING COUNT(*)>1 TRUNCATE TABLE FOO // Drop indices on FOO INSERT FOO SELECT * FROM NEW_TABLE // Recreate indices on FOO

+1

DVK Sep 17 '09 at 5:13

source share

Try the following:

  With s as (Select a,b from foo group by a,b having Count(1)>1) Select foo.* from foo,s where foo.a=sa and foo.b=sb

This query should contain duplicate rows in the foo table.

+1

Himadri Sep 17 '09 at 5:29

source share

here is a different approach

  select * from foo f1 where exists (
   select * from foo f2 where
     f1.id! = f2.id and
     f1.a = f2.a and
     f1.b = f2.b)

In any case, although I consider it more readable, if you have such a huge table, you should check the execution plan, subqueries have a bad reputation for performance ...

you should also consider creating an index (without a clear sentence, obviously) in order to speed up the query ... for huge operations, sometimes it is better to spend time creating the index, updating and then discarding the index ... in this case, I think the index on (a, b) certainly should help a lot ...

0

opensas Sep 17 '09 at 5:15

source share

Your stated goal is to remove all duplicate combination (a,b) . For this you can use multi-table DELETE:

 DELETE t1 FROM foo t1 JOIN foo t2 USING (a, b) WHERE t2.id > t1.id

Before starting it, you can check which lines will be deleted with:

 SELECT DISTINCT t1.id FROM foo t1 JOIN foo t2 USING (a, b) WHERE t2.id > t1.id

The WHERE clause equal to t2.id > t1.id will delete all but one with the highest value for id . In your case, only rows with id equal to 2, 5 or 6 will remain.

0

Josh davis Sep 17 '09 at 12:40

source share

If the id value does not matter at all in the final product, that is, if you can number them all and everything will be fine, and if id is a sequential column, then just select "select separate" on two columns in a new table, delete all the data from the old table and then copy the temporary values.

0

Kev Sep 17 '09 at 12:47

source share

James anderson · Accepted Answer · 2009-09-17T05:02:35+0000

 SELECT * FROM foo first JOIN foo second ON ( first.a = second.a AND first.b = second.b ) AND (first.id <> second.id )

All lines should appear where more than one line has the same combination of a and b.

Hope you have an index in columns a and b.

How to find rows with equal columns?

More articles: