How can I avoid a full table scan in this mysql query?

explain select * from zipcode_distances z inner join venues v on z.zipcode_to=v.zipcode inner join events e on v.id=e.venue_id where z.zipcode_from='92108' and z.distance <= 5 

Iโ€™m trying to find all the โ€œevents on sites within 5 miles of zipcode 92108โ€, however itโ€™s difficult for me to optimize this request.

Here is the explanation:

 id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra 1, SIMPLE, e, ALL, idx_venue_id, , , , 60024, 1, SIMPLE, v, eq_ref, PRIMARY,idx_zipcode, PRIMARY, 4, comedyworld.e.venue_id, 1, 1, SIMPLE, z, ref, idx_zip_from_distance,idx_zip_to_distance,idx_zip_from_to, idx_zip_from_to, 30, const,comedyworld.v.zipcode, 1, Using where; Using index 

I get a full scan of the table in table "e", and I cannot figure out which index I need to create in order for it to work quickly.

Any advice would be appreciated.

thanks

+7
source share
4 answers

Based on the EXPLAIN output in your question, you already have all the indexes the query should use, namely:

 CREATE INDEX idx_zip_from_distance ON zipcode_distances (zipcode_from, distance, zipcode_to); CREATE INDEX idx_zipcode ON venues (zipcode, id); CREATE INDEX idx_venue_id ON events (venue_id); 

(I'm not sure about your index names if idx_zip_from_distance contains a zipcode_to column. If not, you have to add it to make it a covering index . I also included the venues.id column in idx_zipcode for completeness, but considering it the main key for a table and using InnoDB, it will be turned on automatically anyway.)

However, it seems that MySQL chooses a different and possibly suboptimal query plan, where it looks through all the events, finds its places and postal codes, and only then filters the results from a distance. This may be the optimal query plan if the power of the event table was low enough, but from the fact that you ask this question, I assume that it is not.

One of the reasons for a suboptimal query plan might be the fact that you have too many indexes that confuse the scheduler. For example, do you really need all three of these indexes in a zipcode table, given that the data stored in it is apparently symmetrical? Personally, I would suggest only the index described above, plus a unique index (which can also be the primary key if you don't have an artificial one) to (zipcode_to, zipcode_from) (preferably in that order, so any random requests for zipcode_to=? Can use his).

However, based on some testing, I suspect that the main problem, why MySQL chooses the wrong query plan, comes down to the relative power of your tables. Presumably your actual zipcode_distances table is huge and MySQL is not smart enough to understand how the conditions in the WHERE really narrow it down.

If so, the best and easiest solution would be to simply force MySQL to use the indices you need :

 select * from zipcode_distances z FORCE INDEX (idx_zip_from_distance) inner join venues v FORCE INDEX (idx_zipcode) on z.zipcode_to=v.zipcode inner join events e FORCE INDEX (idx_venue_id) on v.id=e.venue_id where z.zipcode_from='92108' and z.distance <= 5 

With this request, you really get the desired query plan. (Here you need FORCE INDEX , because with just USE INDEX , the query planner can still decide to use table scanning instead of the suggested index, defeating the target. This happened to me when I first tested it.)

Ps. Here's a SQLize demo, with and without FORCE INDEX , demonstrating the problem.

+7
source

Indexed columns in both tables?

 e.id and v.venue_id 

If you do not, create indexes in both tables. If you already have, maybe you have few records in one or more tables, and the analyzer discovers that it is more efficient to perform a full scan rather than an indexed read.

+1
source

You can use a subquery:

 select * from zipcode_distances z, venues v, events e where z.id in (select id from zipcode z where z.zipcode_from='92108' and z.distance <= 5) and z.zipcode_to=v.zipcode and v.id=e.venue_id 
0
source

You select all the columns from all the tables (select *) , so thereโ€™s little point in the optimizer using the index, when the query mechanism will then have to search the index from table to table for each row.

0
source

All Articles