The example you presented assumes InnoDB is being used. Let's say that PRIMARY KEY is just id .
INDEX(sex, rating)
is the "secondary key". Each secondary key (in InnoDB) includes PK implicitly, so it really has an ordered list of values (sex, rating, id) . To go to "data" ( <cols> ), it uses id to expand PK BTree (which also contains data) to find the record.
Fast Case : Consequently,
SELECT id FROM profiles WHERE x.sex='M' ORDER BY rating LIMIT 100000, 10
will perform a "range scan" of 100010 "rows" in the index. This will be efficient enough for I / O, since all information is consistent and nothing is wasted. (No, he's not smart enough to jump over 100,000 rows, which would be pretty messy, especially when you consider the mode_insulation transaction.) These 100,010 rows probably fit in about 1000 index blocks. Then it gets 10 id values.
With these 10 identifiers, he can make 10 joins ("NLJ" = "Nested Loop"). Most likely, 10 rows will be scattered throughout the table, you may need 10 disk accesses.
Let it “count disk images” (ignoring non-leaf nodes in BTrees, which are likely to be cached anyway): 1000 + 10 = 1010. On regular disks, this can take 10 seconds.
Slow case . Now consider the original query ( SELECT <cols> FROM profiles WHERE sex='M' ORDER BY rating LIMIT 100000, 10; ). Continuing to assume INDEX(sex, rating) plus an implicit id at the end.
As before, it will index the scan through lines 100010 (1000 beats per disk). But, as they say, it’s too stupid to do what was done above. He will reach the data to get <cols> . This often (depending on caching) requires an accidental disk hit. This may be higher than 100010 disk accesses (if the table is huge and caching is not very useful).
Again, 100,000 are thrown and 10 are delivered. Total “cost”: 100,010 disk hits (worst case), which may take 17 minutes.
Keep in mind that there are 3 releases of high-performance MySQL; they have been written over the past 13 years. You are probably using a much newer version of MySQL than they covered. I do not know if the optimizer was smarter in this area. These, if available to you, may give clues:
EXPLAIN FORMAT=JSON SELECT ...; OPTIMIZER TRACE...
My favorite "Handler" trick to learn how things work might be useful:
FLUSH STATUS; SELECT ... SHOW SESSION STATUS LIKE 'Handler%'.
You will probably see numbers such as 100,000 and 10, or small multiples. But keep in mind that a quick index scan is indexed as 1 per row, and therefore a slow random disk falls into a large set of <cols> .
Review For this method to work, the subquery must have a “coverage” index, with the columns being correctly ordered.
"Coverage" means that (sex, rating, id) contains all the affected columns. (We assume that <cols> contains other columns, possibly bulky, that will not work in INDEX .)
"Correct" column order: columns are in the correct order to go through the entire query. (See also my cookbook .)
- First,
WHERE columns are compared with = constants. ( sex ) - Then comes the entire
ORDER BY , in order. ( rating ) - Finally, it is a “cover”. (
id )