SQL "WITH" Performance and Temp Table (possible "query hint" to simplify)

Given the sample queries below (for simplified examples only)

DECLARE @DT int; SET @DT=20110717; -- yes this is an INT WITH LargeData AS ( SELECT * -- This is a MASSIVE table indexed on dt field FROM mydata WHERE dt=@DT ), Ordered AS ( SELECT TOP 10 * , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number FROM LargeData ) SELECT * FROM Ordered 

and...

 DECLARE @DT int; SET @DT=20110717; BEGIN TRY DROP TABLE #LargeData END TRY BEGIN CATCH END CATCH; -- dump any possible table. SELECT * -- This is a MASSIVE table indexed on dt field INTO #LargeData -- put smaller results into temp FROM mydata WHERE dt=@DT ; WITH Ordered AS ( SELECT TOP 10 * , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number FROM #LargeData ) SELECT * FROM Ordered 

Both give the same results, which are a limited and ranked list of values ​​from a list based on these fields.

When these queries become much more complex (many other tables, many criteria, several levels of "with" the alaises table, etc.), the lower query executes MUCH faster than the upper one. Sometimes in the order of 20x-100x faster.

Question...

Is there some kind of HINT query or another SQL parameter that would prompt SQL Server to perform the same optimization automatically, or other formats of this that would include a cleaner aproach (trying to save the format as much as query 1 as much as possible)?

Please note that “ranking” or secondary queries is just fluff for this example, the actual operations performed do not really matter much.

This is what I was hoping for (or similar, but the idea is clear, hopefully). Remember that this request below does not work.

 DECLARE @DT int; SET @DT=20110717; WITH LargeData AS ( SELECT * -- This is a MASSIVE table indexed on dt field FROM mydata WHERE dt=@DT **OPTION (USE_TEMP_OR_HARDENED_OR_SOMETHING) -- EXAMPLE ONLY** ), Ordered AS ( SELECT TOP 10 * , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number FROM LargeData ) SELECT * FROM Ordered 

EDIT: Important follow-up information!

If in your additional request you add

  TOP 999999999 -- improves speed dramatically 

Your query will behave similarly to using the temporary table in the previous query. I found that runtime improved almost exactly the same way. WHICH FAR SIMPLIER, then using a temporary table and basically I searched.

but

  TOP 100 PERCENT -- does NOT improve speed 

NOT executed in the same way (you should use the static style Number TOP 999999999)

Explanation:

From what I can say from the actual plan for executing the request in both formats (the original with the usual CTE and one with each subquery with TOP 99999999)

A regular query joins everything together, as if all the tables were in one massive query, which is what it is. Filtering criteria apply almost to join points in the plan, which means that many more rows are evaluated and merged all at once.

In the version with TOP 999999999, the actual execution plan clearly separates the auxiliary requests from the main request in order to apply the action of the TOP operators, thereby forcing the creation of an auxiliary request in the bitmap memory, which is then connected to the main request. This, apparently, really does exactly what I wanted, and in fact it can be even more efficient, since servers with large amounts of RAM will be able to execute the request completely in MEMORY without any disk I / O. In my case, we have 280 GB of RAM, so much more then we could really use.

+4
source share
3 answers

Not only can you use indexes in temporary tables, but they allow you to use statistics and the use of hints. I cannot find a link to the possibility of using statistics in the CTE documentation, and it says that you cannot use the hints.

Temp tables are often the most efficient ways when you have a large dataset, when choosing between temporary tables and table variables, even if you don't use indexes (gone, because he will use statistics to develop the plan) and I may suspect that the CTE implementation is more like a varaible table than a temp table.

I think the best thing that can be done is to see how the different rates differ from each other to determine if this can be fixed.

What is your objection to using the temp table when you know that it works better?

+4
source

The problem is that in the first query, the SQL Server query optimizer is able to generate a query plan. In the second query, a good query plan cannot be generated because you are inserting values ​​into a new temporary table. I assume that a full table scan is happening somewhere that you don't see.

In the second query, you can paste the values ​​into the temporary table #LargeData, just like you, and then create a non-clustered index in the "value field" column. This can help improve performance.

0
source

It is possible that SQL is optimizing the wrong parameter value.

There are several options

  • Try using option(RECOMPILE) . There is a cost to this, as it recompiles the request each time, but if you need different plans, it may be worth it.

  • You can also try using OPTION(OPTIMIZE FOR @DT=SomeRepresentatvieValue) The problem with this is that you are choosing the wrong value.

See I smell the parameter! from the Team Query Optimization Team blog

0
source

All Articles