Custom function replacing WHERE col IN (...)

I created a user-defined function to improve performance with queries containing "WHERE col IN (...)", like this case:

SELECT myCol1, myCol2 FROM myTable WHERE myCol3 IN (100, 200, 300, ..., 4900, 5000); 

Requests are generated from a web application and in some cases are much more complicated. The function definition is as follows:

 CREATE FUNCTION [dbo].[udf_CSVtoIntTable] ( @CSV VARCHAR(MAX), @Delimiter CHAR(1) = ',' ) RETURNS @Result TABLE ( [Value] INT ) AS BEGIN DECLARE @CurrStartPos SMALLINT; SET @CurrStartPos = 1; DECLARE @CurrEndPos SMALLINT; SET @CurrEndPos = 1; DECLARE @TotalLength SMALLINT; -- Remove space, tab, linefeed, carrier return SET @CSV = REPLACE(@CSV, ' ', ''); SET @CSV = REPLACE(@CSV, CHAR(9), ''); SET @CSV = REPLACE(@CSV, CHAR(10), ''); SET @CSV = REPLACE(@CSV, CHAR(13), ''); -- Add extra delimiter if needed IF NOT RIGHT(@CSV, 1) = @Delimiter SET @CSV = @CSV + @Delimiter; -- Get total string length SET @TotalLength = LEN(@CSV); WHILE @CurrStartPos < @TotalLength BEGIN SET @CurrEndPos = CHARINDEX(@Delimiter, @CSV, @CurrStartPos); INSERT INTO @Result VALUES (CAST(SUBSTRING(@CSV, @CurrStartPos, @CurrEndPos - @CurrStartPos) AS INT)); SET @CurrStartPos = @CurrEndPos + 1; END RETURN END 

This function is intended to be used like this (or as an INNER JOIN):

 SELECT myCol1, myCol2 FROM myTable WHERE myCol3 IN ( SELECT [Value] FROM dbo.udf_CSVtoIntTable('100, 200, 300, ..., 4900, 5000', ','); 

Does anyone have any optimization flaws in my function or other ways to improve performance in my case? Are there any flaws that I missed?

I am using MS SQL Server 2005 Std and .NET 2.0.

+1
optimization sql-server tsql
source share
5 answers

The CLR solution did not give me good performance, so I will use a recursive query. So, here is the definition of SP that I will use (mainly based on the examples of Erland Sommargogs):

 CREATE FUNCTION [dbo].[priudf_CSVtoIntTable] ( @CSV VARCHAR(MAX), @Delimiter CHAR(1) = ',' ) RETURNS @Result TABLE ( [Value] INT ) AS BEGIN -- Remove space, tab, linefeed, carrier return SET @CSV = REPLACE(@CSV, ' ', ''); SET @CSV = REPLACE(@CSV, CHAR(9), ''); SET @CSV = REPLACE(@CSV, CHAR(10), ''); SET @CSV = REPLACE(@CSV, CHAR(13), ''); WITH csvtbl(start, stop) AS ( SELECT start = CONVERT(BIGINT, 1), stop = CHARINDEX(@Delimiter, @CSV + @Delimiter) UNION ALL SELECT start = stop + 1, stop = CHARINDEX(@Delimiter, @CSV + @Delimiter, stop + 1) FROM csvtbl WHERE stop > 0 ) INSERT INTO @Result SELECT CAST(SUBSTRING(@CSV, start, CASE WHEN stop > 0 THEN stop - start ELSE 0 END) AS INT) AS [Value] FROM csvtbl WHERE stop > 0 OPTION (MAXRECURSION 1000) RETURN END 
+1
source share

I'm not sure about the increase in performance, but I would use it as an internal join and walk away from the internal select statement.

+1
source share

Using UDF in a WHERE clause or (even worse) in a subquery poses problems. The optimizer sometimes does it right, but often makes a mistake and evaluates the function once for each row in the query that you do not need.

If your parameters are static (they seem) and you can release a package with several standards, I would load the results of your UDF into a table variable, and then apply the join with the table variable for your filtering. This should work more reliably.

+1
source share

this cycle will kill performance!

create the table as follows:

 CREATE TABLE Numbers ( Number int not null primary key ) 

which has strings containing values ​​from 1 to 8000 or so, and use this function:

 CREATE FUNCTION [dbo].[FN_ListAllToNumberTable] ( @SplitOn char(1) --REQUIRED, the character to split the @List string on ,@List varchar(8000) --REQUIRED, the list to split apart ) RETURNS @ParsedList table ( RowNumber int ,ListValue varchar(500) ) AS BEGIN /* DESCRIPTION: Takes the given @List string and splits it apart based on the given @SplitOn character. A table is returned, one row per split item, with a columns named "RowNumber" and "ListValue". This function workes for fixed or variable lenght items. Empty and null items will be included in the results set. PARAMETERS: @List varchar(8000) --REQUIRED, the list to split apart @SplitOn char(1) --OPTIONAL, the character to split the @List string on, defaults to a comma "," RETURN VALUES: a table, one row per item in the list, with a column name "ListValue" TEST WITH: ---------- SELECT * FROM dbo.FN_ListAllToNumTable(',','1,12,123,1234,54321,6,A,*,|||,,,,B') DECLARE @InputList varchar(200) SET @InputList='17;184;75;495' SELECT 'well formed list',LEFT(@InputList,40) AS InputList,h.Name FROM Employee h INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue WHERE dt.ListValue IS NOT NULL SET @InputList='17;;;184;75;495;;;' SELECT 'poorly formed list join',LEFT(@InputList,40) AS InputList,h.Name FROM Employee h INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue SELECT 'poorly formed list',LEFT(@InputList,40) AS InputList, ListValue FROM dbo.FN_ListAllToNumTable(';',@InputList) **/ /*this will return empty rows, and row numbers*/ INSERT INTO @ParsedList (RowNumber,ListValue) SELECT ROW_NUMBER() OVER(ORDER BY number) AS RowNumber ,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(@SplitOn, ListValue, number+1)-number - 1))) AS ListValue FROM ( SELECT @SplitOn + @List + @SplitOn AS ListValue ) AS InnerQuery INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue) WHERE SUBSTRING(ListValue, number, 1) = @SplitOn RETURN END /*Function FN_ListAllToNumTable*/ 

I have other versions that do not return empty or null strings, those that only return an element, not a row number, etc. Look in the header comments to see how to use this as part of the JOIN, which is much faster than in the where clause.

+1
source share

Thanks for the contribution, I have to admit that I did a bad research before starting my work. I found that Erland Sommargarg wrote a lot of this problem on his web page, after your answers and after reading his page I decided that I would try to make a CLR to solve this problem.

I tried a recursive query, this led to good performance, but I will try the CLR function anyway.

0
source share

All Articles