Suppose a project uses partitions to structure its data. This concept has a purely specific business and is not related to database partitioning.
Say the business logic is:
- remove from output_table where partition = <partitionX>
- insert into output_table (select * from input_table, where partition = <partitionX>)
Keeping in mind that everything is so structured, let it complicate the problem (in order to proceed to the urgent issue).
Suppose I have a query (SELECT query) that is a potential killer, in terms of time:
insert into output_table ( select * from input_table left outer join additional_table additional_table1 on input_table.id = additional_table1.id left outer join additional_table additional_table2 on additional_table2.id = additional_table1.parent where partition = <partitionX> )
Let me optimize this and study the parameters. Remember that each table has sections. Also note how table 2 joins twice, but in different columns. And also note how an additional table is added on itself
Everything uses the WITH clause, but there are several options, and I would like to know why one of them is better.
a. direct and repeated queries in the WITH section
WITH CACHED_input_table AS ( SELECT * FROM input_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table1 AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table2 AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ) SELECT * FROM CACHED_input_table input_table LEFT OUTER JOIN CACHED_additional_table1 additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table2 additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID
C. reusing a query in a FROM clause
WITH CACHED_input_table AS ( SELECT * FROM input_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ) SELECT * FROM CACHED_input_table input_table LEFT OUTER JOIN CACHED_additional_table additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID
C. reusing a query in the WITH section
WITH CACHED_input_table AS ( SELECT * FROM input_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table1 AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table2 AS ( SELECT * FROM CACHED_additional_table1 ) SELECT * FROM CACHED_input_table input_table LEFT OUTER JOIN CACHED_additional_table1 additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table2 additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID
From experience, option A is the fastest. But why? Can someone explain this? (I play on Oracle v11.2)
I know that perhaps my optimization around this particular concept of company partitions has nothing to do with the general sql optimization around the WITH clause I am asking for, but please take this as an example in real life.
Explain plans
Option A (9900 lines in 7 s)
------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 1037 | 18540 (8)| 00:00:03 | | | |* 1 | HASH JOIN OUTER | | 1 | 1037 | 18540 (8)| 00:00:03 | | | |* 2 | HASH JOIN OUTER | | 1 | 605 | 9271 (8)| 00:00:02 | | | | 3 | PARTITION LIST SINGLE| | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY | | 4 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 | | 5 | PARTITION LIST SINGLE| | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 6 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | | 7 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 8 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | ------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("additional_table"."PARENT"="additional_table"."ID"(+)) 2 - access("input_table"."ID"="additional_table"."ID"(+))
Option B (9900 lines in 10 s)
--------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA2_C26AF925 | | | | | | | | 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 4 | TABLE ACCESS FULL | additional_table1 | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | |* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | | |* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | | | 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY | | 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 | | 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | | 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | --------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("additional_table1"."PARENT"="additional_table2"."ID"(+)) 6 - access("input_table"."ID"="additional_table1"."ID"(+))
Option C (9900 lines in 17 seconds)
--------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA7_C26AF925 | | | | | | | | 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 4 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | |* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | | |* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | | | 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY | | 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 | | 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | | 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | --------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("additional_table1"."PARENT_ID"="CACHED_additional_table"."ID"(+)) 6 - access("input_table"."ID"="additional_table1"."ID"(+))
EDIT :
- explanation plans added.
- edited basic query: there is an input_table and an additional table that is connected twice, once on input_table and once on itself
- the edited query for option A: there is input_table, and additional_value is combined twice, once on input_table and once on a duplicate of itself (additional_table)
- the edited query for option B: there is input_table, and additional_value is combined twice, once on input_table and once on itself, using the same alias (additional_table)
- the edited query for option C: there is input_table, and the additional_value is combined twice, once on input_table and once on another table created from itself in the WITH section