MySQL grouping query optimization

I have three tables: categories, articles and event_ articles with the following structure

categories: id, name (100,000 rows) articles: id, category_id (6000 rows) article_events: id, article_id, status_id (20,000 rows) 

The highest article_events.id for each article line describes the current status of each article.

I am returning a table of categories and the number of articles in them with the most recent status_id event of '1'.

That I am still working, but rather slow (10 seconds) with the size of my tables. Wonder if there is a way to do it faster. As far as I know, all tables have corresponding indexes.

 SELECT c.id, c.name, SUM(CASE WHEN e.status_id = 1 THEN 1 ELSE 0 END) article_count FROM categories c LEFT JOIN articles a ON a.category_id = c.id LEFT JOIN ( SELECT article_id, MAX(id) event_id FROM article_events GROUP BY article_id ) most_recent ON most_recent.article_id = a.id LEFT JOIN article_events e ON most_recent.event_id = e.id GROUP BY c.id 

Basically, I need to join the event table twice, since the status_id query along with MAX (id) simply returns the first status_id found, and not the one associated with the MAX (id) string.

How to make it better? or do I just need to live with 10 seconds? Thanks!

Edit:

Here is my EXPLAIN for the request:

 ID | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra --------------------------------------------------------------------------------------------------------------------------- 1 | PRIMARY | c | index | NULL | PRIMARY | 4 | NULL | 124044 | Using index; Using temporary; Using filesort 1 | PRIMARY | a | ref | category_id | category_id | 4 | c.id | 3 | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6351 | 1 | PRIMARY | e | eq_ref | PRIMARY | PRIMARY | 4 | most_recent.event_id | 1 | 2 | DERIVED | article_events | ALL | NULL | NULL | NULL | NULL | 19743 | Using temporary; Using filesort 
+4
source share
3 answers

If you can exclude subqueries using JOINs, it often works better because views cannot use indexes. Here is your request without subqueries:

 SELECT c.id, c.name, COUNT(a1.article_id) AS article_count FROM categories c LEFT JOIN articles a ON a.category_id = c.id LEFT JOIN article_events ae1 ON ae1.article_id = a.id LEFT JOIN article_events ae2 ON ae2.article_id = a.id AND ae2.id > a1.id WHERE ae2.id IS NULL GROUP BY c.id 

You want to experiment with indexes and use EXPLAIN for testing, but here's my guess (I assume id fields are primary keys and you are using InnoDB):

 categories: `name` articles: `category_id` article_events: (`article_id`, `id`) 
+1
source

Not tried, but I think this will save a bit of work for the database:

 SELECT ae.article_id AS ref_article_id, MAX(ae.id) event_id, ae.status_id, (select a.category_id from articles a where a.id = ref_article_id) AS cat_id, (select c.name from categories c where c.id = cat_id) AS cat_name FROM article_events GROUP BY ae.article_id 

Hope that helps

EDIT:

By the way ... Keep in mind that connections must go through each line, so you should start your choice from the small end and work your way if you can. In this case, the query must be executed through 100,000 records and join them, and then join these 100,000 again and again, and again, even if the values ​​are zero, they still have to go through them.

Hope this all helps ...

0
source

I don't like the fact that this index on categories.id used as you select the whole table.

Try to run:

 ANALYZE TABLE categories; ANALYZE TABLE article_events; 

and re-run the request.

0
source

Source: https://habr.com/ru/post/1411836/


All Articles