Will this query load the entire table into memory

If I have a really large table, this query will load the entire table in memory before it resets, resets:

with parent as ( select * from a101 ) select * from parent where value1 = 159 

As you can see the parent query, the entire table. Will it be loaded into memory. This is a very simplified version of the request. In a real query, there are several joins with other tables. I am evaluating sql server 2012 and postgrsql.

+6
source share
4 answers

In PostgreSQL (at least starting at 9.4), CTEs act as optimization barriers .

The query optimizer will not smooth CTE terms into an external query, discard qualifiers, or pull qualifiers even in trivial cases. Thus, an unqualified SELECT inside the term CTE will always perform a full table scan (or scan only by index, if there is a suitable index).

So in PostgreSQL, these two things are very different, since a simple EXPLAIN will show:

 with parent as ( select * from a101 ) select * from parent where value1 = 159 

and

 SELECT * FROM ( SELECT * FROM a101 ) AS parent WHERE value1 = 159; 

However, that “scanning the entire table” does not necessarily mean “will load the entire table into memory”. PostgreSQL will use TupleStore, which will transparently spill onto a temporary file on disk as it grows.

The initial rationale was that DML in terms of CTE was planned (and later implemented). If DML in terms of CTE is vital, its execution is predictable and complete. This may also be true if the CTE calls data modification functions.

Unfortunately, no one seemed to think "... but what if it's just SELECT and we want to embed it?".

Many in the community seem to see this as a function and regularly make it public as a workaround for optimizer issues. I find this attitude completely perplexing. As a result, it will be very difficult to fix this later, because people intentionally use CTE when they want the optimizer not to modify the query.

In other words, PostgreSQL abuses CTE as pseudo-query-hints (along with the OFFSET 0 hack), because the project policy says that hints of real queries are undesirable or not supported.

AFAIK MS SQL Server can optimize CTE barriers, but can also choose to materialize a result set.

+5
source

I just made EXPLAIN for this query in PostgreSQL. Surprisingly, sequence scanning instead of index search:

  CTE Scan on parent (cost=123.30..132.97 rows=2 width=1711) Filter: (value1 = 159) CTE parent -> Seq Scan on a101 (cost=0.00..123.30 rows=430 width=2060) 

I have a primary key index on value1 , and it is used for a simple query select * from a101 where value1 = 159 .

So the answer is that it scans the entire table. I am surprised, I thought this would work as a view or subquery, but it is not. You can use this to use the index:

 select * from (select * from a101) parent where value1 = 159` 
+2
source

No. Requests are generally evaluated. If you look at the execution plan, you will see that the filter predicate will be applied to the internal search. I mean the triviality of the external search, I am sure that it will be optimized.

Checking execution plans is basic knowledge, and you better know how to do it quickly. After you encounter a real performance problem, you will need to find out this problem, and that is where execution plans come in.

+1
source

CTE is just a language syntax to make the code more readable, they do not affect the performance of query execution.

When a query is executed, it will be executed according to the predefined SQL Server execution steps

 1. FROM 2. ON 3. OUTER 4. WHERE 5. GROUP BY 6. CUBE | ROLLUP 7. HAVING 8. SELECT 9. DISTINCT 10 ORDER BY 11. TOP 

First, the WHERE filter will be applied, and then the records will be selected, so the full table will not be selected in memory.

0
source

All Articles