Recursive SQL query to speed up non-indexed query

Question

Recursive SQL query to speed up non-indexed query

This question is largely due to curiosity, as I have a working request (this will take a little longer than I would like).

I have a table with 4 million rows. The only index in this table is the BigInt ID with auto-increment. The query searches for individual values in one of the columns, but returns only one day. Unfortunately, the ReportDate column that is being evaluated is not of type DateTime or even BigInt, but it has a char (8) value in the format YYYYMMDD. So the request is a bit slow.

SELECT Category FROM Reports where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112) GROUP BY Category

Note that converting the date in the above description simply converts it to the YYYYMMDD format for comparison.

I was wondering if there is a way to optimize this query based on the fact that I know that the only data that interests me is at the bottom of the table. I was thinking of some sort of recursive SELECT function that gradually increased the temporary table that could be used for the final query.

For example, in psuedo-sql:

 N = 128 TemporaryTable = SELECT TOP {N} * FROM Reports ORDER BY ID DESC /* Once we hit a date < Today, we can stop */ if(TemporaryTable does not contain ReportDate < Today) N = N**2 Repeat Select /* We now have a smallish table to do our query */ SELECT Category FROM TemproaryTable where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112) GROUP BY Category

It makes sense? Is something like this possible?

This is on MS SQL Server 2008.

+4

sql sql-server tsql sql-server-2008 recursion

Matt Oct 12 '10 at 16:31

source share

5 answers

Andrew Barber · Answer 1 · 2010-10-12T16:36:57+0000

I would suggest that you do not need to convert Date , which is stored as char data in YYYYMMDD format; This format is essentially sorted by itself. I would instead convert your date for output in this format.

Also, as you record the conversion, it converts the current DateTime for each individual row, so even saving this value for the whole query can speed up the process ... but I think just converting the date you are looking for for this char format will help.

I would also suggest getting the index (s) that you need, of course ... but this is not the question you asked: P

Joe stefanelli · Answer 2 · 2010-10-12T16:36:19+0000

Why not just create the index you want?

 create index idx_Reports_ReportDate on Reports(ReportDate, Category)

Remus Rusanu · Answer 3 · 2010-10-12T16:38:02+0000

No, that doesn't make sense. The only way to optimize this query is to have a coverage index for it:

 CREATE INDEX ndxReportDateCategory ON Reports (ReportDate, Category);

Update

Given your comment that you cannot change the scheme, you need to change the scheme. If you still cannot, then the answer still applies: the solution must have an index.

And finally, to more accurately answer your question, if you have a strong correlation between the ID and ReportData: the ID you are looking for is the largest that has ReportDate less than the date you want:

 SELECT MAX(Id) FROM Reports WHERE ReportDate < 'YYYYMMDD';

This will do a reverse scan of the identifier index and stop at the first identifier that precedes your desired date (i.e. will not check the entire table). Then you can filter the report database if it is not found.

John sansom · Answer 4 · 2010-10-12T17:36:20+0000

I think you will find a discussion about SARGability on Rob Farley's blog, which will be very interesting to read in connection with your topic.

http://sqlblog.com/blogs/rob_farley/archive/2010/01/21/sargable-functions-in-sql-server.aspx

An interesting alternative approach that does not require changing the existing column data type would be to use calculated columns.

 alter table REPORTS add castAsDate as CAST(ReportDate as date) create index rf_so2 on REPORTS(castAsDate) include (ReportDate)

Amy b · Answer 5 · 2010-10-12T16:45:16+0000

One of the query patterns that I sometimes use to enter a log table with similar indexing to yours is to restrict the subquery:

 DECLARE @ReportDate varchar(8) SET @ReportDate = Convert(varchar(8), GetDate(), 112) SELECT * FROM ( SELECT top 20000 * FROM Reports ORDER BY ID desc ) sub WHERE sub.ReportDate = @ReportDate

20k / 4M = 0.5% of the table is read.

Here is the loopback solution. Note. You might want to make the primary key ID and Reportdate indexed in the temp table.

 DECLARE @ReportDate varchar(8) SET @ReportDate = Convert(varchar(8), GetDate(), 112) DECLARE @CurrentDate varchar(8), MinKey bigint SELECT top 2000 * INTO #MyTable FROM Reports ORDER BY ID desc SELECT @CurrentDate = MIN(ReportDate), @MinKey = MIN(ID) FROM #MyTable WHILE @ReportDate <= @CurrentDate BEGIN SELECT top 2000 * INTO #MyTable FROM Reports WHERE ID < @MinKey ORDER BY ID desc SELECT @CurrentDate = MIN(ReportDate), @MinKey = MIN(ID) FROM #MyTable END SELECT * FROM #MyTable WHERE ReportDate = @ReportDate DROP TABLE #MyTable

Recursive SQL query to speed up non-indexed query

More articles: