Using a coverage index to select records for a specific day

I would like to run these queries:

select url from weixin_kol_status where created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';

and

select url from weixin_kol_status where userid in ('...') and created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';

... using this table definition:

 CREATE TABLE `weixin_kol_status` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `url` varchar(512) NOT NULL, `created_at` datetime NOT NULL, `title` varchar(512) NOT NULL DEFAULT '', `text` text, `attitudes_count` int(11) NOT NULL DEFAULT '0', `readcount` int(11) NOT NULL DEFAULT '0', `reposts_count` int(11) NOT NULL DEFAULT '0', `comments_count` int(11) NOT NULL DEFAULT '0', `userid` varchar(32) NOT NULL, `screen_name` varchar(32) NOT NULL, `type` tinyint(4) NOT NULL DEFAULT '0', `ext_data` text, `is_topline` tinyint(4) NOT NULL DEFAULT '0', `is_business` tinyint(4) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), UNIQUE KEY `idx_url` (`url`(255)), KEY `idx_userid` (`userid`), KEY `idx_name` (`screen_name`), KEY `idx_created_at` (`created_at`) ) ENGINE=InnoDB AUTO_INCREMENT=328727437 DEFAULT CHARSET=utf8 | rows = 328727437; 

Requests take a few minutes. How can I optimize my queries? How to use coverage index?

Implementation Plans:

 explain select id from weixin_kol_status where created_at>='2015-12-11 00:00:00' and created_at<='2015-12-11 23:59:59'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: weixin_kol_status type: range possible_keys: idx_created_at key: idx_created_at key_len: 5 ref: NULL rows: 1433704 Extra: Using where; Using index 1 row in set (0.00 sec) 

and

 explain select id from weixin_kol_status where created_at='2015-12-11 00:00:00'\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: weixin_kol_status type: ref possible_keys: idx_created_at key: idx_created_at key_len: 5 ref: const rows: 1 Extra: Using index 1 row in set (0.00 sec) 

but why is the first request Extra: Using where; Using index Extra: Using where; Using index and second query Extra: Using index . The first query did not use a coverage index?

+6
source share
2 answers

How to use coverage index?

Do you know what a coverage index is ? This is an index containing all the columns that you need for your query. So for

 select url from weixin_kol_status where created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59'; 

the minimum coverage index will be something like

  KEY `idx_created_url` (`created_at`, `url`) 

And for

 select url from weixin_kol_status where userid in ('...') and created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59'; 

minimum coverage index may be

  KEY `idx_created_user_url` (`created_at`, `userid`, `url`) 

which will also cover the first request or

  KEY `idx_user_created_url` (`userid`, `created_at`, `url`) 

which will not work for the first request, but may be better off optimizing the second.

You may need to write url(512) instead of url . VARCHAR column is not well indexed. If you made a mistake in the fact that the indexed values ​​are too wide, you can not use the coverage index with this query.

A coverage index is useful because it can respond to everything from the index in memory without accessing a table on disk. Since memory is faster than a disk, this results in faster query. Of course, if your index is unloaded, you still have to load it from disk. Therefore, if you are connected with memory, this may not help.

Please note that the query will use only one index for each table, therefore, separate indexes for each column will not cover any query. You need a composite index that will cover all the necessary columns at once.

As a side note, I think your > and < should be >= and <= respectively. It probably won't make much difference, but you seem to be missing two seconds a day.

+2
source

Few problems

  • UNIQUE(url(255)) limits the first 255 characters to unique; this was probably not desirable.

  • If you need to force the uniqueness of a long row ( url ), add another column with MD5(url) and create this UNIQUE column. (Or something like that.)

  • There is a limit of 767 bytes per column in the index, so if you try to create INDEX(created_at, url) , you will get INDEX(created_at, url(255)) , which does not cover, since not all url are in the index.

  • Both EXPLAINs useless for this discussion, since they do not use the SELECTs you are asking about. First, it says Using index , because you say SELECT id ; actual query SELECT url . This significantly affects performance.

  • You have a very large table. I see no way PARTITIONing help with speed.

  • This is the best way to express a 1-day WHERE :

      created_at >= '2015-12-11' AND created_at < '2015-12-11' + INTERVAL 1 DAY 

To speed it up

Here is a technical technique that should help both. Instead

 PRIMARY KEY (`id`), KEY `idx_created_at` (`created_at`) 

Do it like this:

 PRIMARY KEY(created_at, id), INDEX(id) 

This will be a "cluster" on created_at , thereby significantly reducing the amount of I / O, especially for the first SELECT . Yes, it’s fine if AUTO_INCREMENT is just INDEXed , not UNIQUE and PRIMARY KEY .

Attention: to change it will take several hours and a lot of disk space.

0
source

All Articles