SQL WHERE xx IN ('qq', 'ww' ...) performance - more values, less time?

I finally ask my first question (although I have been a stalker for a long time).

The SQL query got my attention the other day at work. The problem is the performance in the WHERE when comparing the index with possible values ​​using the IN operator.

 SELECT SUM (parts.quantity) AS quantity, concessions.concessionCode, concessions.description AS concessionDesc, parts.type, activities.activityCode, REPLACE (activities.activityCode, activities.lvl2 || '-', '') AS activityCodeDisplay, strings.activityDesc, strings.activityDesc2, strings.activityDesc3 FROM tb_parts parts, tb_activities activities, tb_strings strings, tb_concessions concessions WHERE parts.activityCode = activities.activityCode AND parts.concessionCode = activities.concessionCode AND activities.concessionCode = concesions.concessionCode AND activities.concessionCode = strings.concessionCode AND activities.activityCode = strings.activityCode AND strings.language = 'ENG' --AND parts.concesionCode IN ('ZD', 'G9', 'TR', 'JS0') AND parts.concesionCode IN ('ZD', 'G9') AND parts.date >= TO_DATE ('01/01/2013 00:00:00', 'DD/MM/YYYY HH24:MI:SS') AND parts.date <= TO_DATE ('30/04/2013 23:59:59', 'DD/MM/YYYY HH24:MI:SS') AND parts.type IN ('U', 'M') AND parts.value = 'E' GROUP BY concesions.concessionCode, concesions.description, parts.type, activities.activityCode, REPLACE (activities.activityCode, activities.lvl2|| '-', ''), strings.activityDesc, strings.activityDesc2, strings.activityDesc3 ORDER BY concesions.concessionCode; 

The problem that I have is this: if the request is executed like it (with two values ​​for IN ), it takes 30 seconds. If it is run with four values ​​(for example, in the comments), the request takes 5 seconds. I would expect that comparing an index with multiple values ​​would take longer, but it seems to be wrong. I repeat the “test” several times during the day, and they are always more or less the same (30 + -1s, 5 + -1s).

Any insight into why this behaves this way would be more than appreciated!

PS I rewrote the names of the tables / columns, so sorry if there are any discrepancies.

PPS I rewrote this code with connections and it is much faster, but the reason for this anomaly still bothers me :)

EDIT : Finally, at work! After some alterations, I was able to create execution plans for these two versions and even for the third version of the request (with both 4 and 2 values ​​in WHERE , the time is about 600 ms). In addition, there were a few questions about the data in the tables, so here is some information:

 All the stats are analyzed the day that queries were executed Table parts total rows - 3.2 M matches for 2 values - 1.08 M (~34%) matches for 4 values - 1.30 M (~41%) Table activities total rows - 3866 matches for 2 values - 321 (~ 8%) matches for 4 values - 644 (~16%) Table strings total rows - 7436 matches for 2 values - 642 (~ 8%) matches for 4 values - 1288 (~17%) Index in_parts codConcession username date 

Because of this, I believe that when using dynamic sampling there is no significant difference (except + 2/3) (if I did it correctly, that is, with /*+ dynamic_sampling(tb_parts 10) */ after the SELECT keyword)

For two values :

 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 186 | 864 (1)| 00:00:11 | | 1 | SORT ORDER BY | | 1 | 186 | 864 (1)| 00:00:11 | | 2 | HASH GROUP BY | | 1 | 186 | 864 (1)| 00:00:11 | |* 3 | TABLE ACCESS BY INDEX ROWID | tb_parts | 1 | 37 | 818 (1)| 00:00:10 | | 4 | NESTED LOOPS | | 1 | 186 | 862 (1)| 00:00:11 | | 5 | NESTED LOOPS | | 1 | 149 | 44 (0)| 00:00:01 | | 6 | NESTED LOOPS | | 34 | 2108 | 10 (0)| 00:00:01 | | 7 | INLIST ITERATOR | | | | | | | 8 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 2 | 54 | 2 (0)| 00:00:01 | |* 9 | INDEX UNIQUE SCAN | pk_concession | 2 | | 1 (0)| 00:00:01 | | 10 | TABLE ACCESS BY INDEX ROWID | tb_activities | 17 | 595 | 4 (0)| 00:00:01 | |* 11 | INDEX RANGE SCAN | pk_activity | 17 | | 2 (0)| 00:00:01 | | 12 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 | |* 13 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 | |* 14 | INDEX RANGE SCAN | in_parts | 454 | | 648 (1)| 00:00:08 | ----------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("parts"."value"='E' AND ("parts"."type"='M' OR "parts"."type"='U') AND "parts"."activityCode"="activities"."activityCode") 9 - access("concessions"."concessionCode"='G9' OR "concessions"."concessionCode"='ZD') 11 - access("activities"."concessionCode"="concessions"."concessionCode") filter("activities"."concessionCode"='G9' OR "activities"."concessionCode"='ZD') 13 - access("activities"."concessionCode"="strings"."concessionCode" AND "activities"."activityCode"="strings"."activityCode" AND "strings"."language"='ENG') filter("strings"."concessionCode"='G9' OR "strings"."concessionCode"='ZD') 14 - access("parts"."concessionCode"="activities"."concessionCode" AND "parts"."date">=TO_DATE('2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) filter("parts"."date">=TO_DATE('2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND ("parts"."concessionCode"='G9' OR "parts"."concessionCode"='ZD') AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 

For four values :

 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 186 | 7412 (2)| 00:01:29 | | 1 | SORT ORDER BY | | 1 | 186 | 7412 (2)| 00:01:29 | | 2 | HASH GROUP BY | | 1 | 186 | 7412 (2)| 00:01:29 | | 3 | NESTED LOOPS | | 1 | 186 | 7410 (2)| 00:01:29 | |* 4 | HASH JOIN | | 17 | 1683 | 7393 (2)| 00:01:29 | |* 5 | HASH JOIN | | 136 | 8432 | 21 (5)| 00:00:01 | | 7 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 4 | 108 | 2 (0)| 00:00:01 | |* 8 | INDEX UNIQUE SCAN | pk_concession | 4 | | 1 (0)| 00:00:01 | |* 9 | TABLE ACCESS FULL | tb_activities | 644 | 22540 | 18 (0)| 00:00:01 | |* 10 | TABLE ACCESS FULL | tb_parts | 4310 | 155K| 7372 (2)| 00:01:29 | | 11 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 | |* 12 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- --------------------------------------------------- 4 - access("parts"."activityCode"="activities"."activityCode" AND "parts"."concessionCode"="activities"."concessionCode") 5 - access("activities"."concessionCode"="concessions"."concessionCode") 8 - access("concessions"."concessionCode"='G9' OR "concessions"."concessionCode"='JS0' OR "concessions"."concessionCode"='TR' OR "concessions"."concessionCode"='ZD') 9 - filter("activities"."concessionCode"='G9' OR "activities"."concessionCode"='JS0' OR "activities"."concessionCode"='TR' OR "activities"."concessionCode"='ZD') 10 - filter("parts"."date">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "parts"."value"='E' AND ("parts"."type"='M' OR "parts"."type"='U') AND ("parts"."concessionCode"='G9' OR "parts"."concessionCode"='JS0' OR "parts"."concessionCode"='TR' OR "parts"."concessionCode"='ZD') AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 12 - access("activities"."concessionCode"="strings"."concessionCode" AND "activities"."activityCode"="strings"."activityCode" AND "strings"."language"='ENG') filter("strings"."concessionCode"='G9' OR "strings"."concessionCode"='JS0' OR "strings"."concessionCode"='TR' OR "strings"."concessionCode"='ZD') 

And finally, six values :

 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 186 | 4525 (1)| 00:00:55 | | 1 | SORT ORDER BY | | 1 | 186 | 4525 (1)| 00:00:55 | | 2 | HASH GROUP BY | | 1 | 186 | 4525 (1)| 00:00:55 | | 3 | NESTED LOOPS | | 1 | 186 | 4523 (1)| 00:00:55 | |* 4 | HASH JOIN | | 9 | 891 | 4514 (1)| 00:00:55 | |* 5 | HASH JOIN | | 136 | 8432 | 21 (5)| 00:00:01 | | 6 | INLIST ITERATOR | | | | | | | 7 | TABLE ACCESS BY INDEX ROWID| tb_concesions | 4 | 108 | 2 (0)| 00:00:01 | |* 8 | INDEX UNIQUE SCAN | pk_concession | 4 | | 1 (0)| 00:00:01 | |* 9 | TABLE ACCESS FULL | tb_activities | 644 | 22540 | 18 (0)| 00:00:01 | | 10 | INLIST ITERATOR | | | | | | |* 11 | TABLE ACCESS BY INDEX ROWID | tb_parts | 2155 | 79735 | 4493 (1)| 00:00:54 | |* 12 | INDEX RANGE SCAN | in_parts | 8620 | | 1277 (1)| 00:00:16 | | 13 | TABLE ACCESS BY INDEX ROWID | tb_strings | 1 | 87 | 1 (0)| 00:00:01 | |* 14 | INDEX UNIQUE SCAN | pk_string | 1 | | 0 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("parts"."activityCode"="activities"."activityCode" AND "parts"."concessionCode"="activities"."concessionCode") 5 - access("activities"."concessionCode"="concessions"."concessionCode") 8 - access("concessions"."concessionCode"='G9' OR "concessions"."concessionCode"='JS0' OR "concessions"."concessionCode"='TR' OR "concessions"."concessionCode"='ZD') 9 - filter("activities"."concessionCode"='G9' OR "activities"."concessionCode"='JS0' OR "activities"."concessionCode"='TR' OR "activities"."concessionCode"='ZD') 11 - filter("parts"."value"='E' AND ("parts"."type"='M' OR "parts"."type"='U')) 12 - access(("parts"."concessionCode"='G9' OR "parts"."concessionCode"='ZD') AND "parts"."date">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) filter("parts"."date">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "parts"."date"<=TO_DATE(' 2013-04-30 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 14 - access("activities"."concessionCode"="strings"."concessionCode" AND "activities"."activityCode"="strings"."activityCode" AND "strings"."language"='ENG') filter("strings"."concessionCode"='G9' OR "strings"."concessionCode"='JS0' OR "strings"."concessionCode"='TR' OR "strings"."concessionCode"='ZD') 

Since this is my first meeting with the implementation plan, I can only guess what the reason for the delay is. Between 4 and 6 values, I would suggest that this is a change from FULL ACCESS TO INDEX ACCESS. In addition, when accessing the table, the filter for four values ​​(id 10) contains all four concession values; while for six values, two concession values ​​are in the access part, and the filter contains only the date, type and value.

+6
source share
1 answer

In general, the reason for such anomalies is that the query optimizer cannot accurately predict the cost. The only way to know the exact cost is to simply run the statement several times with different execution plans. Instead, he uses statistics to estimate value, and sometimes an erroneous estimate.

When you compare discharge plans “with two values” with “with four values”, you can see that the latter gives a higher cost, and the plan is completely different. The optimizer had a choice between these two execution plans and should have thought that the first is better with two values, and the second with four values. However, in fact, the second was better in both cases.

If you carefully analyze such anomalies, you often find yourself in information such as a particular combination of values ​​that is overrepresented or underrepresented in your data. Using histograms in your statistics gives the optimizer more hints, and it can handle the “distorted data” better, but its forecasting capabilities will still be limited.

In practice, the solution is exactly what you did: rewrite SQL until you get acceptable performance. Often "hints" (in Oracle) can also give the optimizer additional hints.

0
source

All Articles