In a subquery, the WHERE clause affects the main query. Is this a sign or a mistake?

Assume two tables:

Table A: A1, A2, A_Other Table B: B1, B2, B_Other 

In the following examples, is something is a condition checked for a fixed value, for example. = 'ABC' or < 45 .

I wrote a query similar to the following (1) :

 Select * from A Where A1 IN ( Select Distinct B1 from B Where B2 is something And A2 is something ); 

What I really wanted to write was (2) :

 Select * from A Where A1 IN ( Select Distinct B1 from B Where B2 is something ) And A2 is something; 

Strange, both queries return the same result. If you look at the plan for explaining query 1 , it looked like when the subquery was executed, since condition A2 is something does not apply to the subquery, it was delayed for use as a filter by the main results of the query.

I usually expected query 1 to fail because the subquery itself would fail:

 Select Distinct B1 from B Where B2 is something And A2 is something; --- ERROR: column "A2" does not exist 

But I find that this is not the case, and Postgres rejects the inapplicable subquery conditions for the main request.

Is this standard behavior or Postgres anomaly? Where is this documented and what is called this function?

In addition, I found that if I add column A2 to table B , only query 2 works as originally intended. In this case, the link A2 in query 2 will still refer to A.A2 , but the link in query 1 will refer to the new column B.A2 , because this is now directly applicable in the subquery.

+8
sql postgresql subquery in-subquery
source share
4 answers

An excellent question here is something that many people face, but do not stop and look.

What you do is write a subquery in the WHERE ; not the inline view in the FROM . There is a difference.

When you write a subquery in SELECT or WHERE clauses, you can access the tables that are in the FROM main query. This happens not only in Postgres, but it is standard behavior and can be observed in all leading RDBMS, including Oracle, SQL Server and MySQL.

When you run the first query, the optimizer scans your entire query and determines when to check which conditions. It is this behavior of the optimizer that you see the condition is postponed to the main query, because the optimizer finds out that it evaluates this condition faster in the main query without affecting the final result.

If you run only a subquery, commenting on the main query, it is obliged to return an error in the place that you mentioned, because the column to which it refers was not found.

In the last paragraph, you mentioned that you added column A2 to tableB . What you noticed is true. This is due to an implicit reference phenomenon. If you do not specify a table alias for the column, the database engine searches for the column first in the tables in FROM in the subquery. Only if the column is not found, links are provided in the main queries. If you use the following query, it will still return the same result:

 Select * from A aa -- Check the alias Where A1 IN ( Select Distinct B1 from B bb Where B2 is something And aa.A2 is something -- Check the reference ); 

You may find more information in the Korth book on the relational database, but I'm not sure. I just answered your question based on my observations. I know this is happening and why. I just don’t know how I can provide you with additional links.

+5
source share

Associated subquery: - If the result of the subquery depends on the column value of its parent query table, then the Sub query is called the correlated subquery. This is standard behavior, not an error.

It is not necessary that the column on which the correlated query depended is included in the list of selected columns of the parent query.

 Select * from A Where A1 IN ( Select Distinct B1 from B Where B2 is something And A2 is something ); 

A2 is the column of table A, and the parent query is in table A. This means that you can specify A2 in the subquery. The above query may run slower than the following.

 Select * from A Where A2 is something And A1 IN ( Select Distinct B1 from B Where B2 is something ); 

This is because A2 from the parent query is referenced in a loop. It depends on the conditions for obtaining data. If the subquery is similar to

 Select Distinct B1 from B Where B2 is A2 

we need to specify the parent query column. Alternatively, we can use unions.

+2
source share

You already have an explanation of why correlated subqueries in the WHERE can refer to all columns from tables in the FROM list.

In addition, using JOIN or EXISTS semi-joins are often significantly faster than correlated subqueries. I would rewrite this 100% equivalent query:

 SELECT a.* FROM a JOIN ( SELECT DISTINCT b1 FROM b WHERE b2 is something ) b ON b.b1 = a.a1 WHERE a.a2 is something 

Or, even better:

 SELECT * FROM a WHERE EXISTS ( SELECT 1 FROM b WHERE b.b1 = a.a1 AND b.b2 is something ) AND a.a2 is something; 
+2
source share

The results are not strange, the CAN subquery can pass the PARENT request. This is called Correlated SubQuery and is very common. In your example, you used the IN statement, but usually for an OPTIMIZE query with an IN operation, replace IN with the EXISTS statement with Correlated SubQuery.

To clarify what Erwin says EXISTS is faster, this is because when you use IN, it is sometimes required that Query detect all set values. While using EXISTS simply requires the first discovery to satisfy the condition. However, this may be the case when the query plan optimizes both values. But using EXISTS clearly helps the Optimizer quickly build a planned query plan.

0
source share

All Articles