I am a relative newbie when it comes to databases. We use MySQL, and I'm currently trying to speed up the execution of an SQL statement, which seems to take some time. I looked at SO for a similar question, but did not find it.
The goal is to delete all rows in table A that have the corresponding identifier in table B.
I am currently doing the following:
DELETE FROM a WHERE EXISTS (SELECT b.id FROM b WHERE b.id = a.id);
Table a contains about 100K rows in table a and about 22K rows in table b. The "id" column is PK for both tables.
This operator takes about 3 minutes to run on my test field - Pentium D, XP SP3, 2GB RAM, MySQL 5.0.67. It seems to me slow. Perhaps this is not so, but I was hoping to speed up the process. Is there a better / faster way to achieve this?
EDIT:
Some additional information that may be helpful. Tables A and B have the same structure as I did the following to create table B:
CREATE TABLE b LIKE a;
Table a (and therefore table b) contains several indexes to speed up queries that are made against it. Again, I'm a relative newbie to working with databases and still a student. I do not know what effect, if any, has to do with things. I guess this has an effect, as indexes should also be cleared, right? I also wondered if there are any other database settings that can affect speed.
In addition, I am using INNO DB.
Here is some additional information you might find helpful.
Table A has a structure similar to this one (I processed it a bit):
DROP TABLE IF EXISTS `frobozz`.`a`; CREATE TABLE `frobozz`.`a` ( `id` bigint(20) unsigned NOT NULL auto_increment, `fk_g` varchar(30) NOT NULL, `h` int(10) unsigned default NULL, `i` longtext, `j` bigint(20) NOT NULL, `k` bigint(20) default NULL, `l` varchar(45) NOT NULL, `m` int(10) unsigned default NULL, `n` varchar(20) default NULL, `o` bigint(20) NOT NULL, `p` tinyint(1) NOT NULL, PRIMARY KEY USING BTREE (`id`), KEY `idx_l` (`l`), KEY `idx_h` USING BTREE (`h`), KEY `idx_m` USING BTREE (`m`), KEY `idx_fk_g` USING BTREE (`fk_g`), KEY `fk_g_frobozz` (`id`,`fk_g`), CONSTRAINT `fk_g_frobozz` FOREIGN KEY (`fk_g`) REFERENCES `frotz` (`g`) ) ENGINE=InnoDB AUTO_INCREMENT=179369 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
I suspect that part of the problem is the number of indexes for this table. Table B is similar to table B, although it contains only the columns id and h .
In addition, profiling results are as follows:
starting 0.000018 checking query cache for query 0.000044 checking permissions 0.000005 Opening tables 0.000009 init 0.000019 optimizing 0.000004 executing 0.000043 end 0.000005 end 0.000002 query end 0.000003 freeing items 0.000007 logging slow query 0.000002 cleaning up 0.000002
solvable
Thanks to all the answers and comments. Of course, they made me think about this problem. Kudos to dotjoe for stepping back from the problem by asking a simple question: "Is there a.id link in other tables?"
The problem was that DELETE TRIGGER was specified in table A, which called the stored procedure to update the other two tables, C and D. Table C had FK back in a.id and after performing some actions related to this identifier in the stored procedure, she had a statement
DELETE FROM c WHERE c.id = theId;
I looked at the EXPLAIN instruction and rewrote it as
EXPLAIN SELECT * FROM c WHERE c.other_id = 12345;
So, I could see it doing this, and he gave me the following information:
id 1 select_type SIMPLE table c type ALL possible_keys NULL key NULL key_len NULL ref NULL rows 2633 Extra using where
This told me that it was a painful operation, and since it was going to be called 22,500 times (it is deleted for this data set), it was a problem. As soon as I created INDEX in this other_id column and re-started EXPLAIN, I got:
id 1 select_type SIMPLE table c type ref possible_keys Index_1 key Index_1 key_len 8 ref const rows 1 Extra
Much better, really really great.
I added that Index_1 and my delete times correspond to the times indicated by mattkemp . It was a very subtle mistake on my part due to the fact that at the last moment some additional functionality was added. It turned out that most of the proposed alternative DELETE / SELECT statements, as Daniel pointed out, ended up getting about the same amount of time as soulmerge , the expression was pretty much the best that I could build based on what I needed to do. Once I provided an index for this other table C, my DELETEs were fast.
Pathological :
Two lessons came out of this exercise. First, it’s clear that I did not use the EXPLAIN statement to better understand the impact of my SQL queries. This is a rookie mistake, so I'm not going to fight about it. I learn from this error. Secondly, the offensive code was the result of a “quick response”, and inadequate design / testing led to the fact that this problem did not appear earlier. If I created several massive test data sets to use as test input for this new functionality, I would not have wasted your time and yours. In my testing on the DB side, there was not enough depth that my application side has. Now I have the opportunity to improve this.
Ref: EXPLAIN Expression