SQL performance: using OR is slower than IN when using order

I use MariaDB 10.0.21 and run a query similar to the following query on 12 million lines:

SELECT `primary_key` FROM `texas_parcels` WHERE `zip_code` IN ('28461', '48227', '60411', '65802', '75215', '75440', '75773', '75783', '76501', '76502', '76504', '76511', '76513', '76518', '76519', '76520', '76522', '76525', '76527', '76528', '76530', '76537', '76539', '76541', '76542', '76543', '76548', '76549', '76550', '76556', '76567', '76571', '76574', '76577', '76578', '76642', '76704', '76853', '77418', '77434', '77474', '77833', '77835', '77836', '77845', '77853', '77879', '77964', '77975', '78002', '78003', '78006', '78013', '78028', '78056', '78064', '78070', '78114', '78123', '78130', '78132', '78133', '78154', '78155', '78359', '78382', '78602', '78605', '78606', '78607', '78608', '78609', '78610', '78611', '78612', '78613', '78614', '78615', '78616', '78617', '78619', '78620', '78621', '78623', '78624', '78626', '78628', '78629', '78632', '78633', '78634', '78636', '78638', '78639', '78640', '78641', '78642', '78643', '78644', '78645', '78648', '78650', '78652', '78653', '78654', '78655', '78656', '78657', '78659', '78660', '78662', '78663', '78664', '78665', '78666', '78669', '78672', '78676', '78681', '78699', '78701', '78702', '78703', '78704', '78705', '78717', '78719', '78721', '78722', '78723', '78724', '78725', '78726', '78727', '78728', '78729', '78730', '78731', '78732', '78733', '78734', '78735', '78736', '78737', '78738', '78739', '78741', '78744', '78745', '78746', '78747', '78748', '78749', '78750', '78751', '78752', '78753', '78754', '78756', '78757', '78758', '78759', '78828', '78934', '78940', '78941', '78942', '78945', '78946', '78947', '78948', '78953', '78954', '78956', '78957', '78963', '92536') ORDER BY `timestamp_updated` ASC LIMIT 1000; 

I have a composite index on (zip_code,timestamp_updated) , I get results in ~ 1.6 seconds . In the next request, I am still doing the same zip code, but I use OR instead of IN () .

 SELECT `primary_key` FROM `texas_parcels` WHERE (`zip_code` = '28461' OR `zip_code` = '48227' OR `zip_code` = '60411' OR `zip_code` = '65802' OR `zip_code` = '75215' OR `zip_code` = '75440' OR `zip_code` = '75773' OR `zip_code` = '75783' OR `zip_code` = '76501' OR `zip_code` = '76502' OR `zip_code` = '76504' OR `zip_code` = '76511' OR `zip_code` = '78957' OR `zip_code` = '78963' OR `zip_code` = '92536' OR `zip_code` = '76513' OR `zip_code` = '76518' OR `zip_code` = '76519' OR `zip_code` = '76520' OR `zip_code` = '76522' OR `zip_code` = '76525' OR `zip_code` = '76527' OR `zip_code` = '76528' OR `zip_code` = '76530' OR `zip_code` = '76537' OR `zip_code` = '76539' OR `zip_code` = '76541' OR `zip_code` = '76542' OR `zip_code` = '76543' OR `zip_code` = '76548' OR `zip_code` = '76549' OR `zip_code` = '76550' OR `zip_code` = '76556' OR `zip_code` = '76567' OR `zip_code` = '76571' OR `zip_code` = '76574' OR `zip_code` = '76577' OR `zip_code` = '76578' OR `zip_code` = '76642' OR `zip_code` = '76704' OR `zip_code` = '76853' OR `zip_code` = '77418' OR `zip_code` = '77434' OR `zip_code` = '77474' OR `zip_code` = '77833' OR `zip_code` = '77835' OR `zip_code` = '77836' OR `zip_code` = '77845' OR `zip_code` = '77853' OR `zip_code` = '77879' OR `zip_code` = '77964' OR `zip_code` = '77975' OR `zip_code` = '78002' OR `zip_code` = '78003' OR `zip_code` = '78006' OR `zip_code` = '78013' OR `zip_code` = '78028' OR `zip_code` = '78056' OR `zip_code` = '78064' OR `zip_code` = '78070' OR `zip_code` = '78114' OR `zip_code` = '78123' OR `zip_code` = '78130' OR `zip_code` = '78132' OR `zip_code` = '78133' OR `zip_code` = '78154' OR `zip_code` = '78155' OR `zip_code` = '78359' OR `zip_code` = '78382' OR `zip_code` = '78602' OR `zip_code` = '78605' OR `zip_code` = '78606' OR `zip_code` = '78607' OR `zip_code` = '78608' OR `zip_code` = '78609' OR `zip_code` = '78610' OR `zip_code` = '78611' OR `zip_code` = '78612' OR `zip_code` = '78613' OR `zip_code` = '78614' OR `zip_code` = '78615' OR `zip_code` = '78616' OR `zip_code` = '78617' OR `zip_code` = '78619' OR `zip_code` = '78620' OR `zip_code` = '78621' OR `zip_code` = '78623' OR `zip_code` = '78624' OR `zip_code` = '78626' OR `zip_code` = '78628' OR `zip_code` = '78629' OR `zip_code` = '78632' OR `zip_code` = '78633' OR `zip_code` = '78634' OR `zip_code` = '78636' OR `zip_code` = '78638' OR `zip_code` = '78639' OR `zip_code` = '78640' OR `zip_code` = '78641' OR `zip_code` = '78642' OR `zip_code` = '78643' OR `zip_code` = '78644' OR `zip_code` = '78645' OR `zip_code` = '78648' OR `zip_code` = '78650' OR `zip_code` = '78652' OR `zip_code` = '78653' OR `zip_code` = '78654' OR `zip_code` = '78655' OR `zip_code` = '78656' OR `zip_code` = '78657' OR `zip_code` = '78659' OR `zip_code` = '78660' OR `zip_code` = '78662' OR `zip_code` = '78663' OR `zip_code` = '78664' OR `zip_code` = '78665' OR `zip_code` = '78666' OR `zip_code` = '78669' OR `zip_code` = '78672' OR `zip_code` = '78676' OR `zip_code` = '78681' OR `zip_code` = '78699' OR `zip_code` = '78701' OR `zip_code` = '78702' OR `zip_code` = '78703' OR `zip_code` = '78704' OR `zip_code` = '78705' OR `zip_code` = '78717' OR `zip_code` = '78719' OR `zip_code` = '78721' OR `zip_code` = '78722' OR `zip_code` = '78723' OR `zip_code` = '78724' OR `zip_code` = '78725' OR `zip_code` = '78726' OR `zip_code` = '78727' OR `zip_code` = '78728' OR `zip_code` = '78729' OR `zip_code` = '78730' OR `zip_code` = '78731' OR `zip_code` = '78732' OR `zip_code` = '78733' OR `zip_code` = '78734' OR `zip_code` = '78735' OR `zip_code` = '78736' OR `zip_code` = '78737' OR `zip_code` = '78738' OR `zip_code` = '78739' OR `zip_code` = '78741' OR `zip_code` = '78744' OR `zip_code` = '78745' OR `zip_code` = '78746' OR `zip_code` = '78747' OR `zip_code` = '78748' OR `zip_code` = '78757' OR `zip_code` = '78758' OR `zip_code` = '78759' OR `zip_code` = '78828' OR `zip_code` = '78934' OR `zip_code` = '78940' OR `zip_code` = '78941' OR `zip_code` = '78942' OR `zip_code` = '78945' OR `zip_code` = '78946' OR `zip_code` = '78947' OR `zip_code` = '78948' OR `zip_code` = '78953' OR `zip_code` = '78954' OR `zip_code` = '78956') ORDER BY `timestamp_updated` ASC LIMIT 1000; 

This second query gets the same results in the same order in ~ 7.8 seconds . When doing each request with an explanation, they are almost the same, they give me a slightly different amount of rows .

 id select_type table type possible_keys key key_len ref rows filtered Extra --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 SIMPLE TX_Property range Zip Code Search Zip Code Search 15 (NULL) 2402699 99.88 Using where; Using index; Using filesort 2 SIMPLE TX_Property range Zip Code Search Zip Code Search 15 (NULL) 2321908 99.91 Using where; Using index; Using filesort 

When profiling two queries, the only significant increase in time is the Sorting Result , in the second query it took up to 7.2 seconds .

I suppose I don’t understand how the different operator in where it can make such a huge difference when it comes to order would be reasonable if there was a significant time difference to execute? Maybe I'm just not quite sure how profiling works, and in fact this is just the runtime of the part, but is it just marked?

I also wanted to note that when I ran queries without ORDER BY timestamp_updated ASC , the top query took ~ 0.106 seconds and the second query took ~ 0.157 seconds .

+2
source share
3 answers

Deleting ORDER BY is much faster because it can stop only after 1000 lines. How many lines does OR / IN match?

Note that EXPLAINs says the query is Using index . This means that you have a covering index. These are all fields in the SELECT are in the same index.

In InnoDB, each additional key implicitly includes PK, so INDEX(zip_code, timestamp_updated) effectively INDEX(zip_code, timestamp_updated, primaryKey)

The index is not very efficient, since you have two non-trivial things: (1) IN or OR, (2) ORDER BY. An index can only handle one or the other. Your index allows you to use zip_code . it

  • finds rows in the index that match any of these zipcodes,
  • collects timestamp and pk by putting 3 columns in tmp table
  • sorts
  • provides the first 1000.

If instead you said INDEX(timestamp_updated, zip_code) , you would still have a “coverage” index, but in that flavor, the index (I hope) will prevent the need for SORT. Oh, considering that he could stop after 1000 lines. Here's how it will work:

  • Index scan in timestamp order.
  • Check each row for one of these zip codes. (Here the test may be faster in IN format)
  • If a match, put a string; if 1000, stop.

But wait ... Now you are at the mercy of the 12M lines. If 1000 lines with these fasteners occur earlier (old timestamps), this can quickly stop. If you need to check all the rows to find 1000 (or even not 1000), then this is a full index scan, and this index location is "bad".

If you give the optimizer like INDEXes , it will obediently make a reasonable choice based on inadequate information (without a distribution of values) and may choose the worst.

You really need a two-dimensional index. There are none. (Well, maybe Spatial could be killed?) But ...

PARTITION BY RANGE(timestamp) together with INDEX , starting with zip, may work better. But I doubt that the optimizer is smart enough to realize that if it finds 1000 lines in the first section, it can exit. And it's still bad if there are no 1000 results.

PARTITION BY RANGE(zip) together with INDEX , starting from the timestamp, probably will not help, since many zip files will not be truncated much.

Please provide EXPLAIN FORMAT=JSON SELECT...; for each of your attempts. There may be some subtle clues to explain the wide variations in time.

Did you spend twice each time? (Otherwise, caching might color the results.)

Another approach

I don’t know how good it will be, but here goes:

 SELECT primary_key FROM ( ( SELECT primary_key, timestamp_updated FROM texas_parcels WHERE zip_code = '28461' ORDER BY timestamp_updated LIMIT 1000 ) UNION ALL ( SELECT primary_key, timestamp_updated FROM texas_parcels WHERE zip_code = '48227' ORDER BY timestamp_updated LIMIT 1000 ) UNION ALL ( SELECT primary_key, timestamp_updated FROM texas_parcels WHERE zip_code = '60411' ORDER BY timestamp_updated LIMIT 1000 ) ... ) x ORDER BY timestamp_updated LIMIT 1000 

It seems that x will only have a few hundred thousand rows, not 1.3M. But UNION has some overhead, etc. Note the LIMIT in each subquery and externally. If you need an OFFSET , it gets harder too.

+2
source

You have a pretty long list of zip codes that you are comparing. MySQL has an optimization that affects why the lead time without order by slightly different. With a list of constants, MySQL sorts the list and performs a binary search. I could see this, explaining the latest results.

With order by , I'm not sure. Actual execution may be affected by other actions performed on the server. Do you know something else works?

+1
source

MYSQL has optimization, your choice, when you use OR, the number of comparisons increases.

-1
source

All Articles