Delayed connection function in MySQL

Question

Delayed connection function in MySQL

I am reading MySQL with high performance, and I am a bit confused about a delayed connection.

The book says that the next operation cannot be optimized by the index (sex, rating), because high bias requires them most of the time, looking at a lot of data, which they then throw away.

mysql> SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 100000, 10;

While a delayed connection helps minimize the amount of work, MySQL needs to collect data that it will just throw away.

  SELECT <cols> FROM profiles INNER JOIN ( SELECT <primary key cols> FROM profiles WHERE x.sex='M' ORDER BY rating LIMIT 100000, 10 ) AS x USING(<primary key cols>);

Why pending pooling minimizes the amount of data collected.

+6

database mysql indexing

user1659464 Jul 22 '15 at 6:09

source share

2 answers

Rick james · Answer 1 · 2015-07-25T04:25:31+0000

The example you presented assumes InnoDB is being used. Let's say that PRIMARY KEY is just id .

 INDEX(sex, rating)

is the "secondary key". Each secondary key (in InnoDB) includes PK implicitly, so it really has an ordered list of values (sex, rating, id) . To go to "data" ( <cols> ), it uses id to expand PK BTree (which also contains data) to find the record.

Fast Case : Consequently,

 SELECT id FROM profiles WHERE x.sex='M' ORDER BY rating LIMIT 100000, 10

will perform a "range scan" of 100010 "rows" in the index. This will be efficient enough for I / O, since all information is consistent and nothing is wasted. (No, he's not smart enough to jump over 100,000 rows, which would be pretty messy, especially when you consider the mode_insulation transaction.) These 100,010 rows probably fit in about 1000 index blocks. Then it gets 10 id values.

With these 10 identifiers, he can make 10 joins ("NLJ" = "Nested Loop"). Most likely, 10 rows will be scattered throughout the table, you may need 10 disk accesses.

Let it “count disk images” (ignoring non-leaf nodes in BTrees, which are likely to be cached anyway): 1000 + 10 = 1010. On regular disks, this can take 10 seconds.

Slow case . Now consider the original query ( SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 100000, 10; ). Continuing to assume INDEX(sex, rating) plus an implicit id at the end.

As before, it will index the scan through lines 100010 (1000 beats per disk). But, as they say, it’s too stupid to do what was done above. He will reach the data to get <cols> . This often (depending on caching) requires an accidental disk hit. This may be higher than 100010 disk accesses (if the table is huge and caching is not very useful).

Again, 100,000 are thrown and 10 are delivered. Total “cost”: 100,010 disk hits (worst case), which may take 17 minutes.

Keep in mind that there are 3 releases of high-performance MySQL; they have been written over the past 13 years. You are probably using a much newer version of MySQL than they covered. I do not know if the optimizer was smarter in this area. These, if available to you, may give clues:

 EXPLAIN FORMAT=JSON SELECT ...; OPTIMIZER TRACE...

My favorite "Handler" trick to learn how things work might be useful:

 FLUSH STATUS; SELECT ... SHOW SESSION STATUS LIKE 'Handler%'.

You will probably see numbers such as 100,000 and 10, or small multiples. But keep in mind that a quick index scan is indexed as 1 per row, and therefore a slow random disk falls into a large set of <cols> .

Review For this method to work, the subquery must have a “coverage” index, with the columns being correctly ordered.

"Coverage" means that (sex, rating, id) contains all the affected columns. (We assume that <cols> contains other columns, possibly bulky, that will not work in INDEX .)

"Correct" column order: columns are in the correct order to go through the entire query. (See also my cookbook .)

First, WHERE columns are compared with = constants. ( sex )
Then comes the entire ORDER BY , in order. ( rating )
Finally, it is a “cover”. ( id )

haibo cu · Answer 2 · 2017-09-07T09:17:10+0000

From the description below from the official ( https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html ):

If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it finds the first row_count rows of the sorted result, rather than sorting the entire result. If the order is executed using an index, it is very fast. If a file sort must be executed, all rows matching the query without the LIMIT clause will be selected, and most or all of them will be sorted before the first row_count is found. After the source rows have been found, MySQL does not sort the rest of the result set.

We see that they should not make a difference.

But percona offers this and gives test data. But don’t give any reason, I think that there may be some kind of “error” in mysql when you are doing this. Therefore, we see this as a rewarding experience.

Delayed connection function in MySQL

More articles: