Paging, sorting, and filtering in a stored procedure (SQL Server)

Question

Paging, sorting, and filtering in a stored procedure (SQL Server)

I looked at various ways to write a stored procedure to return a "page" of data. This was for use with ASP ObjectDataSource , but it could be considered as a more general problem.

Requirement - return a subset of data based on the usual parameters of the search call; startPageIndex and maximumRows , but also the sortBy parameter to sort the data. There are also some parameters that are passed to filter data in various conditions.

One common way to do this is something like this:

[Method 1]

 ;WITH stuff AS ( SELECT CASE WHEN @SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name) WHEN @SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC) WHEN @SortBy = ... ELSE ROW_NUMBER() OVER (ORDER BY whatever) END AS Row, ., ., ., FROM Table1 INNER JOIN Table2 ... LEFT JOIN Table3 ... WHERE ... (lots of things to check) ) SELECT * FROM stuff WHERE (Row > @startRowIndex) AND (Row <= @startRowIndex + @maximumRows OR @maximumRows <= 0) ORDER BY Row

One of the problems is that it does not give a general account, and for this we need another stored procedure. This second stored procedure should replicate the parameter list and complex WHERE . Not nice.

One solution is to add an extra column to the final selection list (SELECT COUNT (*) FROM stuff) AS TotalRows . This gives us the total, but repeats it for each row in the result set, which is not ideal.

[Method 2]
Here's an interesting alternative ( http://www.4guysfromrolla.com/articles/032206-1.aspx ) using dynamic SQL. He believes that performance is better, because the CASE statement in the first solution is addictive. Fair enough, and this solution simplifies getting totalRows and spanks it into an output parameter. But I hate dynamic SQL coding. All that "bit SQL" + STR (@ parm1) + "bit bigger than SQL" gubbins.

[Method 3]
The only way I can find to get what I want, without repeating the code that should have been synchronized, and storing things that can be readable, is to return to the “old” way of using the table variable:

 DECLARE @stuff TABLE (Row INT, ...) INSERT INTO @stuff SELECT CASE WHEN @SortBy = 'Name' THEN ROW_NUMBER() OVER (ORDER BY Name) WHEN @SortBy = 'Name DESC' THEN ROW_NUMBER() OVER (ORDER BY Name DESC) WHEN @SortBy = ... ELSE ROW_NUMBER() OVER (ORDER BY whatever) END AS Row, ., ., ., FROM Table1 INNER JOIN Table2 ... LEFT JOIN Table3 ... WHERE ... (lots of things to check) SELECT * FROM stuff WHERE (Row > @startRowIndex) AND (Row <= @startRowIndex + @maximumRows OR @maximumRows <= 0) ORDER BY Row

(Or a similar method using the IDENTITY column in a table variable). Here I can simply add SELECT COUNT to the table variable to get totalRows and put it in the output parameter.

I did some tests and with a fairly simple version of the query (without sortBy and no filter), method 1 seems to appear on top (almost twice as fast as the other 2). Then I decided to check, I probably needed complexity, and I need SQL stored in stored procedures. With this, I get method 1 taking almost twice as much as the other 2 methods. Which seems strange.

Is there a good reason why I should not reject CTE and stick to method 3?

UPDATE - March 15, 2012

I tried to adapt method 1 to unload the page from CTE into a temporary table so that I could extract TotalRows and then select only the appropriate columns for the result set. This seemed to significantly increase the time (more than I expected). I should add that I run this on a laptop with SQL Server Express 2008 (all I have), but still the comparison should be valid.

I looked again at the dynamic SQL method. Turns out I really didn't do it right (just concatenating strings together). I installed it, as in the documentation for sp_executesql (with a description string for the parameters and a list of parameters), and this is much more readable. Also this method works the fastest in my environment. Why is this still puzzling me, but I think the answer is outlined in Hogan's comment.

+7

sql-server tsql stored-procedures

Fruitbat Mar 13 '12 at 21:38

source share

4 answers

Andriy m · Answer 1 · 2012-03-13T21:59:11+0000

I would most likely split the @SortBy argument into two, @SortColumn and @SortDirection and use them as follows:

 … ROW_NUMBER() OVER ( ORDER BY CASE @SortColumn WHEN 'Name' THEN Name WHEN 'OtherName' THEN OtherName … END * CASE @SortDirection WHEN 'DESC' THEN -1 ELSE 1 END ) AS Row …

And this is how the TotalRows column (in the main selection) could be defined:

 … COUNT(*) OVER () AS TotalRows …

Jason whitish · Answer 2 · 2015-04-02T17:51:30+0000

I would definitely like to make a combination of temp and NTILE for this approach.

A temporary table allows you to execute a complex sequence of conditions only once. Since you only save the fragments that you care about, this also means that when you start making a choice against it further in the procedure, it should have less shared memory than if you met the condition several times.

I like NTILE() for this better than ROW_NUMBER() because it does the work you are trying to do for you, instead of worrying about additional where conditions.

The following is an example based on a similar query, which I use as part of a research query; I have an identifier that I can use, which I know will be unique in the results. However, using an identifier that was an identity column would also be appropriate.

 --DECLARES here would be stored procedure parameters declare @pagesize int, @sortby varchar(25), @page int = 1; --Create temp with all relevant columns; ID here could be an identity PK to help with paging query below create table #temp (id int not null primary key clustered, status varchar(50), lastname varchar(100), startdate datetime); --Insert into #temp based off of your complex conditions, but with no attempt at paging insert into #temp (id, status, lastname, startdate) select id, status, lastname, startdate from Table1 ...etc. where ...complicated conditions SET @pagesize = 50; SET @page = 5;--OR CAST(@startRowIndex/@pagesize as int)+1 SET @sortby = 'name'; --Only use the id and count to use NTILE ;with paging(id, pagenum, totalrows) as ( select id, NTILE((SELECT COUNT(*) cnt FROM #temp)/@pagesize) OVER(ORDER BY CASE WHEN @sortby = 'NAME' THEN lastname ELSE convert(varchar(10), startdate, 112) END), cnt FROM #temp cross apply (SELECT COUNT(*) cnt FROM #temp) total ) --Use the id to join back to main select SELECT * FROM paging JOIN #temp ON paging.id = #temp.id WHERE paging.pagenum = @page --Don't need the drop in the procedure, included here for rerunnability drop table #temp;

I usually prefer temporary tables over table variables in this scenario, in many ways so that there are certain statistics about the result set that you have. (Find the temp table and table variable and you will find many examples why)

Dynamic SQL would be most useful for processing a sorting method. Using my example, you can make the main query in dynamic SQL and only pull out the sorting method that you want to insert into OVER() .

The above example also contains the total in each row of the returned set, which, as you mentioned, is not ideal. Instead, you can have the @totalrows output variable in your procedure and pull it out, as well as the result. This will save you from CROSS APPLY , which I do above, in CTE paging.

John dewey · Answer 3 · 2012-03-14T03:24:13+0000

I would create one procedure for creating, sorting, and NTILE() using NTILE() ) an intermediate table; and a second procedure for page recovery. Thus, you do not need to run the entire main request for each page.

This example queries AdventureWorks.HumanResources.Employee:

 -------------------------------------------------------------------------- create procedure dbo.EmployeesByMartialStatus @MaritalStatus nchar(1) , @sort varchar(20) as -- Init staging table if exists( select 1 from sys.objects o inner join sys.schemas s on s.schema_id=o.schema_id and s.name='Staging' and o.name='EmployeesByMartialStatus' where type='U' ) drop table Staging.EmployeesByMartialStatus; -- Populate staging table with sort value with s as ( select * , sr=ROW_NUMBER()over(order by case @sort when 'NationalIDNumber' then NationalIDNumber when 'ManagerID' then ManagerID -- plus any other sort conditions else EmployeeID end) from AdventureWorks.HumanResources.Employee where MaritalStatus=@MaritalStatus ) select * into #temp from s; -- And now pages declare @RowCount int; select @rowCount=COUNT(*) from #temp; declare @PageCount int=ceiling(@rowCount/20); --assuming 20 lines/page select * , Page=NTILE(@PageCount)over(order by sr) into Staging.EmployeesByMartialStatus from #temp; go -------------------------------------------------------------------------- -- procedure to retrieve selected pages create procedure EmployeesByMartialStatus_GetPage @page int as declare @MaxPage int; select @MaxPage=MAX(Page) from Staging.EmployeesByMartialStatus; set @page=case when @page not between 1 and @MaxPage then 1 else @page end; select EmployeeID,NationalIDNumber,ContactID,LoginID,ManagerID , Title,BirthDate,MaritalStatus,Gender,HireDate,SalariedFlag,VacationHours,SickLeaveHours , CurrentFlag,rowguid,ModifiedDate from Staging.EmployeesByMartialStatus where Page=@page GO -------------------------------------------------------------------------- -- Usage -- Load staging exec dbo.EmployeesByMartialStatus 'M','NationalIDNumber'; -- Get pages 1 through n exec dbo.EmployeesByMartialStatus_GetPage 1; exec dbo.EmployeesByMartialStatus_GetPage 2; -- ...etc (this would actually be a foreach loop, but that detail is omitted for brevity) GO

shA.t · Answer 4 · 2015-05-03T15:10:39+0000

I use this method of using EXEC() :

 -- SP parameters: -- @query: Your query as an input parameter -- @maximumRows: As number of rows per page -- @startPageIndex: As number of page to filter -- @sortBy: As a field name or field names with supporting DESC keyword DECLARE @query nvarchar(max) = 'SELECT * FROM sys.Objects', @maximumRows int = 8, @startPageIndex int = 3, @sortBy as nvarchar(100) = 'name Desc' SET @query = ';WITH CTE AS (' + @query + ')' + 'SELECT *, (dt.pagingRowNo - 1) / ' + CAST(@maximumRows as nvarchar(10)) + ' + 1 As pagingPageNo' + ', pagingCountRow / ' + CAST(@maximumRows as nvarchar(10)) + ' As pagingCountPage ' + ', (dt.pagingRowNo - 1) % ' + CAST(@maximumRows as nvarchar(10)) + ' + 1 As pagingRowInPage ' + 'FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ' + @sortBy + ') As pagingRowNo, COUNT(*) OVER () AS pagingCountRow ' + 'FROM CTE) dt ' + 'WHERE (dt.pagingRowNo - 1) / ' + CAST(@maximumRows as nvarchar(10)) + ' + 1 = ' + CAST(@startPageIndex as nvarchar(10)) EXEC(@query)

At the end of the result, after the columns of the query result:

Note:
I am adding additional columns that you can remove:

  pagingRowNo : The row number pagingCountRow : The total number of rows pagingPageNo : The current page number pagingCountPage : The total number of pages pagingRowInPage : The row number that started with 1 in this page

Paging, sorting, and filtering in a stored procedure (SQL Server)

UPDATE - March 15, 2012

More articles: