Let's take a look at the execution plan.


In this particular query run, Seek returned 51 rows instead of a valued 1.
The following actual query creates a plan with the same form, but it is easier to parse because we have two variables @ID1 and @ID2 that you can track in the plan.
CREATE TABLE #wtfTable (WtfId UNIQUEIDENTIFIER PRIMARY KEY); INSERT INTO #wtfTable SELECT TOP(500) NEWID() FROM master.sys.all_objects o1 (NOLOCK) CROSS JOIN master.sys.all_objects o2 (NOLOCK); DECLARE @ID1 UNIQUEIDENTIFIER; DECLARE @ID2 UNIQUEIDENTIFIER; SELECT TOP(1) @ID1 = WtfId FROM #wtfTable ORDER BY WtfId; SELECT TOP(1) @ID2 = WtfId FROM #wtfTable ORDER BY WtfId DESC; -- ACTUAL QUERY: SELECT * FROM #wtfTable WHERE WtfId IN (@ID1, @ID2); DROP TABLE #wtfTable;
If you carefully study the operators in this regard, you will see that part of the IN query is converted to a table with two rows and three columns. The Concatenation statement returns this table. Each row in this auxiliary table defines a search range in the index.
ExpFrom ExpTo ExpFlags @ID1 @ID1 62 @ID2 @ID2 62
Internal ExpFlags indicate which range search is needed ( < , <= , > , >= ). If you add more variables to the IN clause, you will see them in the plan combined with this helper table.
Operators
Sort and Merge Interval must be sure that all possible overlapping ranges are combined. A detailed post about the Merge Interval operator Fabiano Amorim who is considering plans with this form. Here's another good post about this form of Paul White's plan.
At the end, a two-row helper table joins the main table, and for each row in the auxiliary table, there is a search range in the cluster index from ExpFrom to ExpTo , which is shown in Index Seek . The Seek operator shows < and > , but is misleading. The actual comparison is determined by the internal value of the Flags .
If you have several different ranges, for example:
WHERE ([WtfId] >= @ID1 AND [WtfId] < @ID2) OR [WtfId] = @ID3
you will still see the same plan form with the same search predicate, but with different Flags values.
So there are two options:
from @ID1 to @ID1, which returns one row from @ID2 to @ID2, which returns one row
In a query with variables, internal expressions result in values ββfrom variables if necessary. The value of the variable does not change during query execution, and everything behaves correctly as expected.
How NEWID() affects it
When we use NEWID , as in your example:
SELECT * FROM
the plan and all internal processing are the same as for the variables.
The difference is that this internal table effectively becomes:
ExpFrom ExpTo ExpFlags 0...0 0...0 62 NEWID() NEWID() 62
NEWID() is called twice. Naturally, each call creates a different value, which randomly leads to a range that spans some existing values ββin the table.
There are two scans of a cluster index range with ranges
from `0...0` to `0...0` from `some_id_1` to `some_id_2`
Now itβs easy to understand how such a query can return some rows, although the probability of a NEWID collision NEWID very small.
It seems that the optimizer believes that it can call NEWID twice instead of remembering the first random value generated and use it further in the query. There were other cases where the optimizer, called NEWID more than expected, created similar, seemingly impossible results.
For example:
Is it legal for SQL Server to populate PERSISTED columns with data that does not match the definition?
Inconsistent results with computed column NEWID () and PERSISTED
The optimizer should know that NEWID() not deterministic. All in all, this seems like a mistake.
I don't know anything about the internal components of SQL Server, but my wild guess is: there are constant runtime functions like RAND() . NEWID() was put into this category by mistake. Then someone noticed that people do not expect him to return the same ID as RAND() returns the same random number for every call. And they fixed it, actually updating a new identifier every time NEWID() appears in expressions. But the general rules for the optimizer remained the same as for RAND() , therefore, the higher-level optimizer believes that all calls to NEWID() return the same value and freely rebuild expressions using NEWID() , which leads to unexpected results .
Another question arises about the similar strange behavior of NEWID() :
NEWID () Entered virtual table causes unintentional cross-cast behavior
The response says that there is a Connection Error Report and it is closed as "Do not fix". Microsoft's comments essentially say this is design behavior.
The optimizer does not guarantee synchronization or the number of executions of scalar functions. This is a long established principle. This is a fundamental "freedom" that allows the optimizer to get enough freedom, significant improvements in query execution.