Optimizing MySQL query to avoid scanning a large number of rows

Question

Optimizing MySQL query to avoid scanning a large number of rows

I am launching an application that uses tables similar to the tables below. There is one table for articles, and another table for tags. I want to get the last 30 articles for a specific tag order by article id. for example, "acer", the query below will complete the task, but it will not be indexed correctly, because it will scan many lines if there are many articles related to a specific tag. How to run a query to get the same result without scanning a large number of rows?

EXPLAIN SELECT title FROM tag, article WHERE tag = 'acer' AND tag.article_id = article.id ORDER BY tag.article_id DESC LIMIT 0 , 30

Output

 id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE tag ref tag tag 92 const 220439 Using where; Using index 1 SIMPLE article eq_ref PRIMARY PRIMARY 4 testdb.tag.article_id 1

Color tables and sample data:

 CREATE TABLE `article` ( `id` int(11) NOT NULL auto_increment, `title` varchar(60) NOT NULL, `time_stamp` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000001 ; -- -- Dumping data for table `article` -- INSERT INTO `article` VALUES (1, 'Saudi Apple type D', 1313390211); INSERT INTO `article` VALUES (2, 'Japan Apple type A', 1313420771); INSERT INTO `article` VALUES (3, 'UAE Samsung type B', 1313423082); INSERT INTO `article` VALUES (4, 'UAE Apple type H', 1313417337); INSERT INTO `article` VALUES (5, 'Japan Samsung type D', 1313398875); INSERT INTO `article` VALUES (6, 'UK Acer type B', 1313387888); INSERT INTO `article` VALUES (7, 'Saudi Sony type D', 1313429416); INSERT INTO `article` VALUES (8, 'UK Apple type B', 1313394549); INSERT INTO `article` VALUES (9, 'Japan HP type A', 1313427730); INSERT INTO `article` VALUES (10, 'Japan Acer type C', 1313400046); CREATE TABLE `tag` ( `tag` varchar(30) NOT NULL, `article_id` int(11) NOT NULL, UNIQUE KEY `tag` (`tag`,`article_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; -- -- Dumping data for table `tag` -- INSERT INTO `tag` VALUES ('Samsung', 1); INSERT INTO `tag` VALUES ('Acer', 2); INSERT INTO `tag` VALUES ('Sony', 3); INSERT INTO `tag` VALUES ('Apple', 4); INSERT INTO `tag` VALUES ('Acer', 5); INSERT INTO `tag` VALUES ('HP', 6); INSERT INTO `tag` VALUES ('Acer', 7); INSERT INTO `tag` VALUES ('Sony', 7); INSERT INTO `tag` VALUES ('Acer', 7); INSERT INTO `tag` VALUES ('Samsung', 9);

+4

mysql query-optimization

usef_ksa Aug 15 '11 at 19:30

source share

4 answers

try the ANSI join syntax:

 SELECT title FROM tag t INNER JOIN article a ON t.article_id = a.id WHERE t.tag = 'acer' ORDER BY tag.article_id DESC LIMIT 0 , 30

then put the index in tag.tag. Assuming you have enough selectivity in this table, and article.id is the primary key, this should be pretty zippy.

0

Jeremy holovacs Aug 15 '11 at 19:39

source share

Edit: add this index

 UNIQUE KEY tag (article_id,tag)

0

Gerry Aug 15 '11 at 19:56

source share

I would suggest changing the storage engine and schema for using foreign keys.

 CREATE TABLE `article` ( `id` int(11) NOT NULL auto_increment, `title` varchar(60) NOT NULL, `time_stamp` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000001 ; CREATE TABLE `tag` ( `id` int(11) NOT NULL auto_increment, `tag` varchar(30) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `article_tag` ( `id` int(11) NOT NULL auto_increment, `article_id` int(11) NOT NULL, `tag_id` int(11) NOT NULL, PRIMARY KEY (`id`), FOREIGN KEY (`article_id`) REFERENCES article(id), FOREIGN KEY (`tag_id`) REFERENCES tag(id) ) ENGINE=Innodb;

The result is the following request:

 EXPLAIN SELECT * FROM article JOIN article_tag ON article.id = article_tag.id JOIN tag ON article_tag.tag_id = tag.id WHERE tag.tag="Acer"; +----+-------------+-------------+--------+----------------+---------+---------+-------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+--------+----------------+---------+---------+-------------------------+------+-------------+ | 1 | SIMPLE | article_tag | ALL | PRIMARY,tag_id | NULL | NULL | NULL | 1 | | | 1 | SIMPLE | tag | eq_ref | PRIMARY | PRIMARY | 4 | temp.article_tag.tag_id | 1 | Using where | | 1 | SIMPLE | article | eq_ref | PRIMARY | PRIMARY | 4 | temp.article_tag.id | 1 | | +----+-------------+-------------+--------+----------------+---------+---------+-------------------------+------+-------------+ 3 rows in set (0.00 sec)

0

Sean Aug 15 '11 at 19:57

source share

Quassnoi · Accepted Answer · 2011-08-17T10:18:31+0000

What makes you think that the query will check a large number of rows?

The request will scan exactly 30 records using the UNIQUE index on tag (tag, article_id) , attach the article to each record in the PRIMARY KEY and stop.

This is exactly what your plan says.

I just made this test script:

 CREATE TABLE `article` ( `id` int(11) NOT NULL auto_increment, `title` varchar(60) NOT NULL, `time_stamp` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000001 ; CREATE TABLE `tag` ( `tag` varchar(30) NOT NULL, `article_id` int(11) NOT NULL, UNIQUE KEY `tag` (`tag`,`article_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; INSERT INTO article SELECT id, CONCAT('Article ', id), UNIX_TIMESTAMP('2011-08-17' - INTERVAL id SECOND) FROM t_source; INSERT INTO tag SELECT CASE fld WHEN 1 THEN CONCAT('tag', (id - 1) div 10 + 1) ELSE tag END AS tag, id FROM ( SELECT tag, id, FIELD(tag, 'Other', 'Acer', 'Sony', 'HP', 'Dell') AS fld, RAND(20110817) AS rnd FROM ( SELECT 'Other' AS tag UNION ALL SELECT 'Acer' AS tag UNION ALL SELECT 'Sony' AS tag UNION ALL SELECT 'HP' AS tag UNION ALL SELECT 'Dell' AS tag ) t JOIN t_source ) q WHERE POWER(3, -fld) > rnd;

where t_source is a table with 1M records in it and runs your query:

 SELECT * FROM tag t JOIN article a ON a.id = t.article_id WHERE t.tag = 'acer' ORDER BY t.article_id DESC LIMIT 30;

It was a moment.

Optimizing MySQL query to avoid scanning a large number of rows

More articles: