The request lasts longer by adding unused WHERE clauses

I hit an interesting trap (at least interesting to me). Below is a general idea of ​​what my query looks like. Suppose @AuthorType is the entry to a stored procedure and that in every place where I posted the comments, there are various specialized conditions.

SELECT * FROM TBooks WHERE (--...SOME CONDITIONS) OR (@AuthorType = 1 AND --...DIFFERENT CONDITIONS) OR (@AuthorType = 2 AND --...STILL MORE CONDITIONS) 

Interestingly, if I run this SP with @AuthorType = 0, it runs slower than if I deleted the last two sets of conditions (those that add conditions for the specialized @AuthorType values).

Should SQL Server implement at run time that these conditions will never be met and completely ignore them? The difference I feel is small; it approximately doubles the length of the request (from 1-2 seconds to 3-5 seconds).

I expect SQL Server to optimize this for me too much? Do I really need to have 3 separate joint ventures for specialized conditions?

+1
sql sql-server where-clause
source share
3 answers

Should SQL Server realize that these conditions will never be met and ignore them completely?

No, absolutely not. There are two factors here.

  • SQL Server does not guarantee a logical short circuit of the statement. See SQL Server Buffer Operator Short Circuit for an example showing how query optimization can change the order in which boolean expressions are evaluated. While at first glance this seems like an imperative C error, similar to a set of programming skills, this is the right thing for the declaratively oriented world of SQL.

  • OR is an adversary of SARGability SQL. SQL statements are compiled into an execution plan, then the plan is executed. The plan is reused between calls (cached). Thus, the SQL compiler must generate one single plan that is suitable for all individual cases of OR (@AuthorType = 1 AND @AuthorType = 2 AND @AuthorType = 3). When it comes to creating a query plan, it is exactly as if @AuthorType would have all the values ​​at once, in a sense. The result is almost always the worst possible plan that cannot benefit any index, because the different OR branches contradict each other, so it finishes scanning the entire table and checking the rows one by one.

The best thing you can do in your case, and any other case that includes a logical OR, is to move @AuthorType outside the request:

 IF (@AuthorType = 1) SELECT ... FROM ... WHERE ... ELSE IF (@AuthorType = 2) SELECT ... FROM ... WHERE ... ELSE ... 

Since each branch is explicitly divided into its own statement, SQL can create the correct access path for each individual case.

The next best thing is to use UNION ALL, as chadhoc has already suggested, and is the right approach in looks or other places where a single statement is required (IF is not allowed).

+6
source share

This is due to how difficult it is for the optimizer to process OR logic, as well as problems with the sniffing parameter . Try changing your query above to a UNION approach, as indicated in the post here . those. you will complete the work with several operators combined with one single attribute @AuthorType = x AND, allowing the optimizer to exclude parts where the AND logic does not match the given @AuthorType, and, in turn, look for the corresponding indexes. will look something like this:

 SELECT * FROM TBooks WHERE (--...SOME CONDITIONS) AND @AuthorType = 1 AND --...DIFFERENT CONDITIONS) union all SELECT * FROM TBooks WHERE (--...SOME CONDITIONS) AND @AuthorType = 2 AND --...DIFFERENT CONDITIONS) union all ... 
+4
source share

I have to fight the desire to reduce duplication ... but a person that I really don't like.

Would it β€œfeel” better?

  SELECT ... lots of columns and complicated stuff ...
 FROM 
 (
     SELECT MyPK
     FROM TBooks
     WHERE 
     (--... SOME CONDITIONS) 
     AND @AuthorType = 1 AND --... DIFFERENT CONDITIONS) 
     union all 
     SELECT MyPK
     FROM TBooks
     WHERE 
     (--... SOME CONDITIONS) 
     AND @AuthorType = 2 AND --... DIFFERENT CONDITIONS) 
     union all 
     ... 
 ) AS B1
 JOIN TBooks AS B2
     ON B2.MyPK = B1.MyPK
 JOIN ... other tables ...

Pseudo-table B1 is just a WHERE clause to get PK. Then it connects to the source table (and any others that are required) to get a "presentation". This avoids duplication of presentation columns in each UNION ALL.

You can do this again and insert PK into the temporary table first and then join it to other tables for the presentation aspect.

We do this for very large tables, where the user has many options that need to be completed.

  DECLARE @MyTempTable TABLE
 (
     MyPK int NOT NULL,
     PRIMARY KEY
     (
         Mypk
     )
 )

 IF @LastName IS NOT NULL
 BEGIN
    INSERT INTO @MyTempTable
    (
         Mypk
    )
    SELECT MyPK
    FROM MyNamesTable
    WHERE LastName = @LastName - Lets say we have an efficient index for this
 End
 ELSE
 IF @Country IS NOT NULL
 BEGIN
    INSERT INTO @MyTempTable
    (
         Mypk
    )
    SELECT MyPK
    FROM MyNamesTable
    WHERE Country = @Country - Got an index on this one too
 End

 ... etc

 SELECT ... presentation columns
 FROM @MyTempTable AS T
     JOIN MyNamesTable AS N
         ON N.MyPK = T.MyPK - a PK join, V. efficient
     JOIN ... other tables ...
         ON ....
 WHERE (@LastName IS NULL OR Lastname @LastName)
       AND (@Country IS NULL OR Country @Country)

Note that all tests are repeated [technically you do not need @Lastname one :)], including obscure ones that (let's say) were not in the original filters to create @MyTempTable.

Creating @MyTempTable is designed to make the most of any setting. Perhaps if both @LastName and @Country are available, which populate the table much more efficiently than one of them, so we are creating a case for this scenario.

Scaling issues? Review what actual queries are in progress and add cases for those that can be improved.

0
source share

All Articles