Which is faster - DOESN'T ENTER OR DOES NOT EXIST?

I have an insert-select statement that should only insert rows where a particular row id does not exist in either of the other two tables. Which of the following will be faster?

INSERT INTO Table1 (...) SELECT (...) FROM Table2 t2 WHERE ... AND NOT EXISTS (SELECT 'Y' from Table3 t3 where t2.SomeFK = t3.RefToSameFK) AND NOT EXISTS (SELECT 'Y' from Table4 t4 where t2.SomeFK = t4.RefToSameFK AND ...) 

... or...

 INSERT INTO Table1 (...) SELECT (...) FROM Table2 t2 WHERE ... AND t2.SomeFK NOT IN (SELECT RefToSameFK from Table3) AND t2.SomeFK NOT IN (SELECT RefToSameFK from Table4 WHERE ...) 

... or do they do roughly the same thing? Also, is there any other way to structure this query that would be preferable? I usually donโ€™t like subqueries, because they add another โ€œdimensionโ€ to the query, which increases the execution time for polynomial factors.

+8
performance sql tsql
source share
4 answers

It usually doesn't matter if NOT IN is slower / faster than NOT EXISTS because they are NOT equivalent in the presence of NULL . Read:

NOT IN vs NOT EXISTS

In these cases, you almost always want NOT EXISTS , because it has the usually expected behavior.

If they are equivalent, it is likely that your database has already defined this and will generate the same execution plan for both.

In those few cases where both options are compatible and your database cannot understand this, it is better to analyze both execution plans and choose the best options for your specific case.

+9
source share

You can use the LEFT OUTER JOIN and check if the value in the RIGHT table is NULL. If NULL, the row does not exist. This is one way to avoid subqueries.

 SELECT (...) FROM Table2 t2 LEFT OUTER JOIN t3 ON (t2.someFk = t3.ref) WHERE t3.someField IS NULL 
+1
source share

It depends on the size of the tables, the available indexes, and the power of these indexes.

If you do not get the same execution plan for both queries, and if none of the query plans executes a JOIN instead of a sub query, I would suggest that the second version is faster. The first version is correlated and, therefore, will create many more subqueries, the second version can be satisfied with three sums of requests.

(Also, note that different engines can be biased in one or the other direction. Some engines can correctly determine that the requests are the same (if they are actually the same) and allow the same execution plan.)

+1
source share

For large tables, NOT EXISTS / EXISTS is recommended, because the IN clause delays the subquery many times depending on the table architecture.

Based on cost optimizer:

There is no difference.

0
source share

All Articles