I'm trying to better understand why this query optimization is so significant (more than 100 times faster), so I can reuse the same logic for other queries.
Using MySQL 4.1 - RESET QUERY CACHE and FLUSH TABLES were executed before all queries and result time can be played back sequentially. The only thing that is obvious to me on EXPLAIN is that during JOIN you need to find only 5 lines? But is the whole answer to speed? Both queries use a partial index (forum_stickies) to determine the status of deleted topics (topic_status = 0)
Screenshots for deeper analysis with EXPLAIN
slow request: 0.7 seconds (cleared cache)
SELECT SQL_NO_CACHE forum_id, topic_id FROM bb_topics
WHERE topic_last_post_id IN
(SELECT SQL_NO_CACHE MAX (topic_last_post_id) AS topic_last_post_id
FROM bb_topics WHERE topic_status=0 GROUP BY forum_id)
fast request: 0.004 seconds or less (cleared cache)
SELECT SQL_NO_CACHE forum_id, topic_id FROM bb_topics AS s1
JOIN
(SELECT SQL_NO_CACHE MAX(topic_last_post_id) AS topic_last_post_id
FROM bb_topics WHERE topic_status=0 GROUP BY forum_id) AS s2
ON s1.topic_last_post_id=s2.topic_last_post_id
Please note that there is no index in the most important column ( topic_last_post_id), but this cannot help (the results are saved for reuse anyway).
Is the answer simply because the first query needs to scan topic_last_post_idTWICE, the second time, to match the results with the subquery? If so, why is it exponentially slower?
(less important, I'm curious why the first query still takes so long if I really find the index on topic_last_post_id)
update: I found this thread in stackoverflow after a lot of searching later, which goes into this section Subqueries and joins
source
share