Why does this (uncorrelated) subquery cause such problems?

Question

Why does this (uncorrelated) subquery cause such problems?

I have a big query in which a simple optimization of a subquery reduced it from 8 minutes to 20 seconds. I'm not sure I understand why optimization had such a radical effect.

In essence, here is the problem:

SELECT (bunch of stuff) FROM a LEFT OUTER JOIN b ON a.ID = ba LEFT OUTER JOIN c ON b.ID = cb ... ... INNER JOIN veryLargeTable ON a.ID = veryLargeTable.a AND veryLargeTable.PetID = (SELECT id from Pets WHERE Pets.Name = 'Something') /* BAD! */ ... ...

In total, there are 16 joined tables. If I replace the second connection predicate veryLargeTable with a pre-populated variable containing petID (instead of using a subquery), the whole query will increase dramatically:

 AND veryLargeTable.PetID = @petID /* Awesome! */

Obviously (SELECT id from Pets WHERE Name = 'Something') is executed for each row. There are two things that I don’t quite understand:

As far as I can tell, this is an uncorrelated subquery. The pet table is not part of the external query. Not independent estimates of uncorrelated subqueries (and therefore optimized)? Why is it wrong here?
Execution plans vary greatly. In the event of a failure (above), the entire subtree deals with an estimated 950 thousand lines. In case of winning (using a variable instead of a subquery), there are only about 125 thousand Evaluation lines. What's happening? Why are there so many other rows if this subquery exists? The Pets.Name column definitely has unique data (but not a single unique restriction, as far as I can tell).

Note that moving the predicate to the WHERE clause does not affect the query in any way, as you would expect, since it is an INNER JOIN.

The views are appreciated!

+4

sql sql-server tsql subquery sql-server-2005

womp Aug 26 '10 at 17:19

source share

4 answers

Alternatively, I think you could exclude the subquery with:

 ... INNER JOIN veryLargeTable vLT ON a.ID = vLT.a INNER JOIN Pets p ON vLT.PetID = p.id and p.Name = 'Something' ...

+4

Joe stefanelli Aug 26 '10 at 17:40

source share

I personally think that the result is not surprising if there is no index on Pets.Name. If you create a unique index in Pets.Name, you are likely to see better results. Without an index, from a server perspective, a subquery can return multiple rows or NULL. Perhaps the optimizer can do better; he often needs help.

0

vaso Aug 27 '10 at 6:12

source share

The reason is, as you pointed out, and in my experience, often even the simplest uncorrelated subqueries are often reviewed by the SQL Server query optimizer.

For example, you can look at the execution plan for the next query and see that the unreflected subquery has been recounted.

 SELECT ID FROM #table1 WHERE ID in (SELECT ID from #table1) UNION ALL SELECT ID FROM #table1 WHERE ID in (SELECT ID from #table1)

In this case, with the identifier attribute or without a clustered index. As someone noted, you can rewrite this query to use a connection instead of a subquery. However, in many cases this can be done if the subquery returns instead an aggregate scalar, for example

 where ID = (select MAX(ID) from #table1)

then rewriting a connection may not work that easily.

0

T. webster Mar 6 '12 at 3:02

source share

Philip kelley · Accepted Answer · 2010-08-26T17:50:23+0000

In my experience, the more complex your queries, the less capable the SQL optimizer is to create clever plans. Here you have 16 associations, some or most of them are external connections, you have at least one subquery ... throw enough indexes, powers, views, external applications, and who knows that no one else, not even Microsoft Engineers *, can determine the procedures that will evenly and regularly generate the most optimal plans.

What you described, I have experienced many times - change one simple thing in a messy request and everything is an order of magnitude faster (or, grinding your teeth, slower). I have no way to determine when the complex is too complex, it is more a feeling than anything else. My general rule is that if it looks too long or too complicated, simplify where you can, for example, your pre-selected one-time nested value or snatch out a part of the query, which will always be executed quickly with a small set of results, and starts it first , and saves the results in the temp table.

(* Please note that this is a soft sarcsam)

Why does this (uncorrelated) subquery cause such problems?

More articles: