Why are queries faster when subqueries inside a WITH clause are duplicated?

Question

Why are queries faster when subqueries inside a WITH clause are duplicated?

Suppose a project uses partitions to structure its data. This concept has a purely specific business and is not related to database partitioning.

Say the business logic is:

remove from output_table where partition = <partitionX>
insert into output_table (select * from input_table, where partition = <partitionX>)

Keeping in mind that everything is so structured, let it complicate the problem (in order to proceed to the urgent issue).

Suppose I have a query (SELECT query) that is a potential killer, in terms of time:

insert into output_table ( select * from input_table left outer join additional_table additional_table1 on input_table.id = additional_table1.id left outer join additional_table additional_table2 on additional_table2.id = additional_table1.parent where partition = <partitionX> )

Let me optimize this and study the parameters. Remember that each table has sections. Also note how table 2 joins twice, but in different columns. And also note how an additional table is added on itself

Everything uses the WITH clause, but there are several options, and I would like to know why one of them is better.

a. direct and repeated queries in the WITH section

 WITH CACHED_input_table AS ( SELECT * FROM input_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table1 AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table2 AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ) SELECT * FROM CACHED_input_table input_table LEFT OUTER JOIN CACHED_additional_table1 additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table2 additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID

C. reusing a query in a FROM clause

 WITH CACHED_input_table AS ( SELECT * FROM input_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ) SELECT * FROM CACHED_input_table input_table LEFT OUTER JOIN CACHED_additional_table additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID

C. reusing a query in the WITH section

 WITH CACHED_input_table AS ( SELECT * FROM input_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table1 AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ), CACHED_additional_table2 AS ( SELECT * FROM CACHED_additional_table1 ) SELECT * FROM CACHED_input_table input_table LEFT OUTER JOIN CACHED_additional_table1 additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table2 additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID

From experience, option A is the fastest. But why? Can someone explain this? (I play on Oracle v11.2)

I know that perhaps my optimization around this particular concept of company partitions has nothing to do with the general sql optimization around the WITH clause I am asking for, but please take this as an example in real life.

Explain plans

Option A (9900 lines in 7 s)

 ------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 1037 | 18540 (8)| 00:00:03 | | | |* 1 | HASH JOIN OUTER | | 1 | 1037 | 18540 (8)| 00:00:03 | | | |* 2 | HASH JOIN OUTER | | 1 | 605 | 9271 (8)| 00:00:02 | | | | 3 | PARTITION LIST SINGLE| | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY | | 4 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 | | 5 | PARTITION LIST SINGLE| | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 6 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | | 7 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 8 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | ------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("additional_table"."PARENT"="additional_table"."ID"(+)) 2 - access("input_table"."ID"="additional_table"."ID"(+))

Option B (9900 lines in 10 s)

 --------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA2_C26AF925 | | | | | | | | 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 4 | TABLE ACCESS FULL | additional_table1 | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | |* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | | |* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | | | 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY | | 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 | | 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | | 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA2_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | --------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("additional_table1"."PARENT"="additional_table2"."ID"(+)) 6 - access("input_table"."ID"="additional_table1"."ID"(+))

Option C (9900 lines in 17 seconds)

 --------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2813 | 18186 (11)| 00:00:03 | | | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6CA7_C26AF925 | | | | | | | | 3 | PARTITION LIST SINGLE | | 1362K| 561M| 9248 (8)| 00:00:02 | KEY | KEY | | 4 | TABLE ACCESS FULL | additional_table | 1362K| 561M| 9248 (8)| 00:00:02 | 24 | 24 | |* 5 | HASH JOIN OUTER | | 1 | 2813 | 8939 (15)| 00:00:02 | | | |* 6 | HASH JOIN OUTER | | 1 | 1493 | 4470 (15)| 00:00:01 | | | | 7 | PARTITION LIST SINGLE | | 1 | 173 | 2 (0)| 00:00:01 | KEY | KEY | | 8 | TABLE ACCESS FULL | input_table | 1 | 173 | 2 (0)| 00:00:01 | 24 | 24 | | 9 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | | 11 | VIEW | | 1362K| 1714M| 4447 (14)| 00:00:01 | | | | 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6CA7_C26AF925 | 1362K| 561M| 4447 (14)| 00:00:01 | | | --------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("additional_table1"."PARENT_ID"="CACHED_additional_table"."ID"(+)) 6 - access("input_table"."ID"="additional_table1"."ID"(+))

EDIT :

explanation plans added.
edited basic query: there is an input_table and an additional table that is connected twice, once on input_table and once on itself
the edited query for option A: there is input_table, and additional_value is combined twice, once on input_table and once on a duplicate of itself (additional_table)
the edited query for option B: there is input_table, and additional_value is combined twice, once on input_table and once on itself, using the same alias (additional_table)
the edited query for option C: there is input_table, and the additional_value is combined twice, once on input_table and once on another table created from itself in the WITH section

+6

sql oracle query-optimization

Belun Jun 22 '16 at 2:06

source share

4 answers

Oracle has the ability to materialize the subqueries defined in the with clause if it considers it beneficial. Usually (but not necessarily always!), This will be done if you refer to the same subquery more than once in the main query.

When Oracle materializes a subquery, it runs sql and then saves the results in a global temporary table overs. Then, for subsequent calls, it queries the temporary table.

In your case, I see that option A repeats the same query as the subquery - you will need to check your execution plans to see what Oracle is doing behind the scenes.

+1

Boneist Jun 22 '16 at 14:14

source share

Generic table expressions (WITH clause) should be very similar to a regular choice with union / subqueries when not using recursive (after all, their purpose). Perhaps he can better optimize two links to the same table.

You will need to use the actual execution plan to find any differences, and this will be specific to your installation, so it is difficult for you to answer this question.

I doubt that there will be any significant differences between these requests, but (I suppose Oracle) you can use another thing to optimize INSERT - APPEND hint :

 INSERT /* + APPEND */ INTO YourTable SELECT ...

+1

sagi Jun 22 '16 at 14:14

source share

 insert into output_table ( select * from input_table left outer join additional_table additional_table1 on input_table.id = additional_table1.id left outer join additional_table additional_table2 on additional_table2.id = additional_table1.parent where partition = <partitionX> )

If the baseline is above, then option A is NOT equivalent. The following would be closer, I think.

 insert into output_table ( select * from input_table left outer join additional_table additional_table1 on input_table.id = additional_table1.id and additional_table1.partition = <partitionX> left outer join additional_table additional_table2 on additional_table2.id = additional_table1.parent and additional_table2.partition = <partitionX> where partition = <partitionX> )

The point that is in option A has reduced the size of the linked tables. In the baseline, this is not true.

B.1. Single CTE as the basis for both connections

 WITH CACHED_additional_table AS ( SELECT * FROM additional_table WHERE PARTITION_ID = < partition X > ) SELECT * FROM input_table input_table LEFT OUTER JOIN CACHED_additional_table additional_table1 ON input_table.ID = additional_table1.ID LEFT OUTER JOIN CACHED_additional_table additional_table2 ON additional_table1.PARENT_ID = additional_table2.ID

The difference between this option and option B is that you only cache one query result and then use that one common table expression (CTE) twice in the main query. This is a good precedent for CTE (avoiding repetition).

0

Used_By_Already Jun 24 '16 at 8:16

source share

Rob van Wijk · Accepted Answer · 2016-06-24T08:31:24+0000

Query A should read three sections, one section from input_table, and two times one section of sub_table.

Query B should read two sections, one section of input_table and one section of an additional table. Then he should write this one section to the temporary table and read this temporary table twice.

So, assuming the scores are fine: Query A reads 1 row in the input_table section + 2 times 1362K rows in the extra_ table

Query B reads 1 row in the input_table section + 3 times 1362K rows in an additional table + temporary table + writes 1362K rows.

If the optimizer decides to materialize your factorized subqueries, you are worse. You can prevent materialization using the built-in tooltip, among other things.

Why are queries faster when subqueries inside a WITH clause are duplicated?

Explain plans

More articles: