Given the sample queries below (for simplified examples only)
DECLARE @DT int; SET @DT=20110717; -- yes this is an INT WITH LargeData AS ( SELECT * -- This is a MASSIVE table indexed on dt field FROM mydata WHERE dt=@DT ), Ordered AS ( SELECT TOP 10 * , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number FROM LargeData ) SELECT * FROM Ordered
and...
DECLARE @DT int; SET @DT=20110717; BEGIN TRY DROP TABLE
Both give the same results, which are a limited and ranked list of values from a list based on these fields.
When these queries become much more complex (many other tables, many criteria, several levels of "with" the alaises table, etc.), the lower query executes MUCH faster than the upper one. Sometimes in the order of 20x-100x faster.
Question...
Is there some kind of HINT query or another SQL parameter that would prompt SQL Server to perform the same optimization automatically, or other formats of this that would include a cleaner aproach (trying to save the format as much as query 1 as much as possible)?
Please note that “ranking” or secondary queries is just fluff for this example, the actual operations performed do not really matter much.
This is what I was hoping for (or similar, but the idea is clear, hopefully). Remember that this request below does not work.
DECLARE @DT int; SET @DT=20110717; WITH LargeData AS ( SELECT *
EDIT: Important follow-up information!
If in your additional request you add
TOP 999999999 -- improves speed dramatically
Your query will behave similarly to using the temporary table in the previous query. I found that runtime improved almost exactly the same way. WHICH FAR SIMPLIER, then using a temporary table and basically I searched.
but
TOP 100 PERCENT -- does NOT improve speed
NOT executed in the same way (you should use the static style Number TOP 999999999)
Explanation:
From what I can say from the actual plan for executing the request in both formats (the original with the usual CTE and one with each subquery with TOP 99999999)
A regular query joins everything together, as if all the tables were in one massive query, which is what it is. Filtering criteria apply almost to join points in the plan, which means that many more rows are evaluated and merged all at once.
In the version with TOP 999999999, the actual execution plan clearly separates the auxiliary requests from the main request in order to apply the action of the TOP operators, thereby forcing the creation of an auxiliary request in the bitmap memory, which is then connected to the main request. This, apparently, really does exactly what I wanted, and in fact it can be even more efficient, since servers with large amounts of RAM will be able to execute the request completely in MEMORY without any disk I / O. In my case, we have 280 GB of RAM, so much more then we could really use.