George Mastros wrote:
I would suggest UDF for this. Since the UDF I'm going to suggest does not concern any tables, the performance should be pretty good.
I agree that Scalar UDF "only memory" is pretty fast. In fact, I actually used one of George Scalar's UDFs, which solved the "Starting Hats" issue, to demonstrate that sometimes the "best" code "strong> IS NOT is always the best fit."
However, Martin Smith (another poster on this very topic) was definitely on the right track. In this case, "Set Based" is still appropriate. Of course, anyone can make an unfounded performance claim, so let her warm up with a performance demonstration.
To demonstrate, we first need some test data. A lot of test data, because both functions that we are going to test are pretty fast. Here is the code for creating a million line test pattern.
--===== Conditionally drop the test table -- to make reruns in SSMS easier IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL DROP TABLE
There is no need to rewrite George’s excellent function here, but I need to send mine. The following function gives the same result as George. It looks like "iTVF" (Inline Table Valued Function), but this only returns a single value. That is why Microsoft calls them "built-in scalar functions" (I call them "iSF" short). A.
CREATE FUNCTION dbo.CleanDuplicatesJBM (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT Item = STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN, REPLICATE(@DuplicateChar,33),@DuplicateChar), REPLICATE(@DuplicateChar,17),@DuplicateChar), REPLICATE(@DuplicateChar, 9),@DuplicateChar), REPLICATE(@DuplicateChar, 5),@DuplicateChar), REPLICATE(@DuplicateChar, 3),@DuplicateChar), REPLICATE(@DuplicateChar, 2),@DuplicateChar), REPLICATE(@DuplicateChar, 2),@DuplicateChar) ,1,1,'') ; GO
First, let's test George Scalar UDF. Please read the comments on why we are not using SET STATISTICS TIME ON here.
/****************************************************************************** Test George code. Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure duration a different way. We'll also throw away the result in a "Bit Bucket" variable because we're trying to measure the performance of the function rather than how long it takes to display or store results. ******************************************************************************/ --===== Declare some obviously named variables DECLARE @StartTime DATETIME, @BitBucket VARCHAR(36) ; --===== Start the "Timer" SELECT @StartTime = GETDATE() ; --===== Run the test on the function SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A') FROM #MyHead ; --===== Display the duration in milliseconds PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; --===== Run the test a total of 5 times GO 5
Here are the results from this "fiver" run ...
Beginning execution loop 15750 15516 15543 15480 15510 Batch execution completed 5 times. (Average is 15,559 on my 10 year old, single 1.8Ghz CPU)
Now we will launch the version of "iSF" ...
/****************************************************************************** Test Jeff code. Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly the same way that we tested George code so we're comparing apples-to-apples. This includes throwing away the result in a "Bit Bucket" variable because we're trying to measure the performance of the function rather than how long it takes to display or store results. ******************************************************************************/ --===== Declare some obviously named variables DECLARE @StartTime DATETIME, @BitBucket VARCHAR(36) ; --===== Start the "Timer" SELECT @StartTime = GETDATE() ; --===== Run the test on the function SELECT @BitBucket = cleaned.ITEM FROM #MyHead CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned ; --===== Display the duration in milliseconds PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; --===== Run the test a total of 5 times GO 5
Here are the results of this launch.
Beginning execution loop 6856 6810 7020 7350 6996 Batch execution completed 5 times. (Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)
My point is not that George's code is bad. It's my pleasure. In fact, I use Scalar UDF when there is no "single request" solution. I will also declare and return to George, saying that not all “single request” solutions are always the best.
Just don't stop looking for them when it comes to UDF .; -)