I have three tables: categories, articles and event_ articles with the following structure
categories: id, name (100,000 rows) articles: id, category_id (6000 rows) article_events: id, article_id, status_id (20,000 rows)
The highest article_events.id for each article line describes the current status of each article.
I am returning a table of categories and the number of articles in them with the most recent status_id event of '1'.
That I am still working, but rather slow (10 seconds) with the size of my tables. Wonder if there is a way to do it faster. As far as I know, all tables have corresponding indexes.
SELECT c.id, c.name, SUM(CASE WHEN e.status_id = 1 THEN 1 ELSE 0 END) article_count FROM categories c LEFT JOIN articles a ON a.category_id = c.id LEFT JOIN ( SELECT article_id, MAX(id) event_id FROM article_events GROUP BY article_id ) most_recent ON most_recent.article_id = a.id LEFT JOIN article_events e ON most_recent.event_id = e.id GROUP BY c.id
Basically, I need to join the event table twice, since the status_id query along with MAX (id) simply returns the first status_id found, and not the one associated with the MAX (id) string.
How to make it better? or do I just need to live with 10 seconds? Thanks!
Edit:
Here is my EXPLAIN for the request:
ID | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra --------------------------------------------------------------------------------------------------------------------------- 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 124044 | Using index; Using temporary; Using filesort 1 | PRIMARY | a | ref | category_id | category_id | 4 | c.id | 3 | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6351 | 1 | PRIMARY | e | eq_ref | PRIMARY | PRIMARY | 4 | most_recent.event_id | 1 | 2 | DERIVED | article_events | ALL | NULL | NULL | NULL | NULL | 19743 | Using temporary; Using filesort