Remove duplicate duplicate characters

Question

Remove duplicate duplicate characters

I have a line in my stored procedure, for example ',,,sam,,bob,' or ',,,' from the above line I need to remove several commas from it, it should look like this: 'sam,bob,' or only if ',,,' , then '' . I should use only Sql server functions. I am using Sql Server 2008 and .Net 3.5

Thanks in advance.

+8

sql tsql sql-server-2008

Nash Apr 26 '11 at 17:33

source share

5 answers

I would suggest UDF for this. Since the UDF I'm going to suggest does not concern any tables, the performance should be pretty good.

 CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1)) Returns VarChar(8000) WITH SCHEMABINDING AS Begin Set @Data = @DuplicateChar + @Data While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0 Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar) Return Right(@Data, Len(@Data)-1) End

You can check the function as follows:

 Select dbo.CleanDuplicates(',,,', ',') Select dbo.CleanDuplicates(',,,sam,,bob,', ',')

+5

G mastros Apr 26 '11 at 17:53

source share

try it

 SELECT @Parameter AS 'BEFORE' BEGIN WHILE CHARINDEX(',,', @Parameter) > 0 BEGIN SELECT @Parameter = REPLACE(@Parameter, ',,',',') END SELECT @Parameter AS 'AFTER' END

+2

cookiemonsta May 24 '12 at 6:11

source share

George Mastros wrote:
I would suggest UDF for this. Since the UDF I'm going to suggest does not concern any tables, the performance should be pretty good.

I agree that Scalar UDF "only memory" is pretty fast. In fact, I actually used one of George Scalar's UDFs, which solved the "Starting Hats" issue, to demonstrate that sometimes the "best" code "strong> IS NOT is always the best fit."

However, Martin Smith (another poster on this very topic) was definitely on the right track. In this case, "Set Based" is still appropriate. Of course, anyone can make an unfounded performance claim, so let her warm up with a performance demonstration.

To demonstrate, we first need some test data. A lot of test data, because both functions that we are going to test are pretty fast. Here is the code for creating a million line test pattern.

 --===== Conditionally drop the test table -- to make reruns in SSMS easier IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL DROP TABLE #MyHead GO --===== Create and populate the test table on-the-fly. -- This builds a bunch of GUIDs and removes the dashes from them to -- increase the chances of duplicating adjacent characters. -- Not to worry. This takes less than 7 seconds to run because of -- the "Pseudo Cursor" created by the CROSS JOIN. SELECT TOP 1000000 RowNum = IDENTITY(INT,1,1), SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','') INTO #MyHead FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 ; GO

There is no need to rewrite George’s excellent function here, but I need to send mine. The following function gives the same result as George. It looks like "iTVF" (Inline Table Valued Function), but this only returns a single value. That is why Microsoft calls them "built-in scalar functions" (I call them "iSF" short). A.

  CREATE FUNCTION dbo.CleanDuplicatesJBM (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT Item = STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN, REPLICATE(@DuplicateChar,33),@DuplicateChar), REPLICATE(@DuplicateChar,17),@DuplicateChar), REPLICATE(@DuplicateChar, 9),@DuplicateChar), REPLICATE(@DuplicateChar, 5),@DuplicateChar), REPLICATE(@DuplicateChar, 3),@DuplicateChar), REPLICATE(@DuplicateChar, 2),@DuplicateChar), REPLICATE(@DuplicateChar, 2),@DuplicateChar) ,1,1,'') ; GO

First, let's test George Scalar UDF. Please read the comments on why we are not using SET STATISTICS TIME ON here.

 /****************************************************************************** Test George code. Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure duration a different way. We'll also throw away the result in a "Bit Bucket" variable because we're trying to measure the performance of the function rather than how long it takes to display or store results. ******************************************************************************/ --===== Declare some obviously named variables DECLARE @StartTime DATETIME, @BitBucket VARCHAR(36) ; --===== Start the "Timer" SELECT @StartTime = GETDATE() ; --===== Run the test on the function SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A') FROM #MyHead ; --===== Display the duration in milliseconds PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; --===== Run the test a total of 5 times GO 5

Here are the results from this "fiver" run ...

 Beginning execution loop 15750 15516 15543 15480 15510 Batch execution completed 5 times. (Average is 15,559 on my 10 year old, single 1.8Ghz CPU)

Now we will launch the version of "iSF" ...

 /****************************************************************************** Test Jeff code. Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly the same way that we tested George code so we're comparing apples-to-apples. This includes throwing away the result in a "Bit Bucket" variable because we're trying to measure the performance of the function rather than how long it takes to display or store results. ******************************************************************************/ --===== Declare some obviously named variables DECLARE @StartTime DATETIME, @BitBucket VARCHAR(36) ; --===== Start the "Timer" SELECT @StartTime = GETDATE() ; --===== Run the test on the function SELECT @BitBucket = cleaned.ITEM FROM #MyHead CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned ; --===== Display the duration in milliseconds PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; --===== Run the test a total of 5 times GO 5

Here are the results of this launch.

 Beginning execution loop 6856 6810 7020 7350 6996 Batch execution completed 5 times. (Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)

My point is not that George's code is bad. It's my pleasure. In fact, I use Scalar UDF when there is no "single request" solution. I will also declare and return to George, saying that not all “single request” solutions are always the best.

Just don't stop looking for them when it comes to UDF .; -)

+1

Jeff moden Jul 07 '12 at 3:59

source share

Your decisions are good, but

it's just a comma
I hate the loop-based TSQL loop; -)

therefore, I wrote based on universal code based on a set of Marcin solutions to replace each declared kind of duplicates:

 DECLARE @Duplicate NVARCHAR(100)= '#$' DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$$g' DECLARE @Replacement NVARCHAR(MAX)= '' DECLARE @OutputString NVARCHAR(MAX)= @teststring ; WITH numbers AS ( SELECT ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number FROM sys.objects o CROSS JOIN sys.objects o2 ), chars AS ( SELECT SUBSTRING(@Duplicate, 1, 1) CHAR , CAST(1 AS INT) [LEVEL] UNION ALL SELECT SUBSTRING(@Duplicate, numbers.Number, 1) CHAR , CAST(numbers.Number AS INT) [LEVEL] FROM numbers JOIN chars ON chars.Level + 1 = numbers.Number WHERE LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0 ), Replicated AS ( SELECT REPLICATE(CHAR, numbers.number) Repl , numbers.Number FROM chars CROSS JOIN numbers ) SELECT @OutputString = REPLACE(@OutputString, Repl, @Replacement) FROM replicated WHERE number <= LEN(@TestString) SELECT @OutputString

You can declare every kind of char in a Duplicate line and every replacement line in @Replacement. An additional reinforcement of IMHO is that I am looking for a replacement only in the range of the maximum length of the input string

0

Dalex Apr 27 '11 at 11:30

source share

Martin smith · Accepted Answer · 2011-04-26T17:59:15+0000

This works for strings that are exclusively commas or have up to 398 contiguous commas.

  SELECT CASE WHEN TargetString NOT LIKE '%[^,]%' THEN '' /*The string is exclusively commas*/ ELSE REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString, REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/ REPLICATE(',',8),','), /* 39/ 8 = 4 remainder 7*/ REPLICATE(',',4),','), /* 11/ 4 = 2 remainder 3*/ REPLICATE(',',2),','), /* 5/ 2 = 2 remainder 1*/ REPLICATE(',',2),',') /* 3/ 2 = 1 remainder 1*/ END FROM T

Add 2 extra powers on top if you need more or remove on top if you need less. The comments of each stage indicate the smallest number with which this stage will not cope successfully.

All comment lines are in this format.

 /* L/D = Q remainder R */ D: Corresponds to the length of the string generated by `REPLICATE` R: Is always D-1 Q+R: Form L for the next step

So, to expand the series up with another step REPLICATE(',',32),',')

 D = 32 R = 31 Q = 368 (399-31) L = (368 * 32) + 31 = 11807

So, this applies to comma sections up to 11806 characters.

Remove duplicate duplicate characters

More articles: