I know this is an old thread, but I have a somewhat dirty method that is much faster and more configurable, in terms of speed I would say 10 seconds instead of 100 seconds (10: 1).
My method requires all the dirty stuff you were trying to avoid:
- Group by (and after)
- concat group with ORDER BY
- 2 temporary tables
- using files on disk!
- somehow (php?) deleting the file after
But when you talk about MILLIONS (or, in my case, Ten Million), it's worth it.
In any case, there is not much of it, because the comment is in Portuguese, but here is my example:
EDIT : if I get comments, I will explain further how it works :)
START TRANSACTION; DROP temporary table if exists to_delete; CREATE temporary table to_delete as ( SELECT -- escolhe todos os IDs duplicados menos os que ficam na BD -- A ordem de escolha dos IDs é dada por "ORDER BY campo_ordenacao DESC" em que o primeiro é o que fica right( group_concat(id ORDER BY campos_ordenacao DESC SEPARATOR ','), length(group_concat(id ORDER BY campos_ordenacao DESC SEPARATOR ',')) - locate(",",group_concat(id ORDER BY campos_ordenacao DESC SEPARATOR ',')) ) as ids, count(*) as c -- Tabela a eliminar duplicados FROM teste_dup -- campos a usar para identificar duplicados group by test_campo1, test_campo2, teste_campoN having count(*) > 1 -- é duplicado ); -- aumenta o limite desta variável de sistema para o máx SET SESSION group_concat_max_len=4294967295; -- envia os ids todos a eliminar para um ficheiro select group_concat(ids SEPARATOR ',') from to_delete INTO OUTFILE 'sql.dat'; DROP temporary table if exists del3; create temporary table del3 as (select CAST(1 as signed) as ix LIMIT 0); -- insere os ids a eliminar numa tabela temporaria a partir do ficheiro load data infile 'sql.dat' INTO TABLE del3 LINES TERMINATED BY ','; alter table del3 add index(ix); -- elimina os ids seleccionados DELETE teste_dup -- tabela from teste_dup -- tabela join del3 on id=ix; COMMIT;
JDuarteDJ Jul 16 '14 at 18:40 2014-07-16 18:40
source share