Split a string into multiple delimiters

Question

Split a string into multiple delimiters

I have a set of addresses:

34 Main St Suite 23 435 Center Road Ste 3 34 Jack Corner Bldg 4 2 Some Street Building 345

separators will be:

 Suite, Ste, Bldg, Building

I would like to split these addresses into address1 and address2 as follows:

 +---------------------+--------------+ | Address1 | Address2 | +---------------------+--------------+ | 34 Main St | Suite 23 | | 435 Center Road | Ste 3 | | 34 Jack Corner | Bldg 4 | | 2 Some Street | Building 345 | +---------------------+--------------+

How can I define a set of delimiters and delimit this way?

+4

sql sql-server tsql sql-server-2008

l --''''''--------- '' '' '' '' '' '' ' Jul 25 '12 at 20:16

source share

4 answers

You can use the separator table on which the separation will be performed. In this example, I use XML for parsing, but after you replace the robust delimiter instead of your set (Ste, Suite, etc.), you can split using any of many t-sql-based methods.

 declare @tab table (s varchar(100)) insert into @tab select '34 Main St Suite 23' union all select '435 Center Road Ste 3' union all select '34 Jack Corner Bldg 4' union all select '2 Some Street Building 345' union all select '20950 N. Tatum Blvd., Ste 300' union all select '1524 McHenry Ave Ste 470'; declare @delimiters table (d varchar(100)); insert into @delimiters select 'Suite' union all select 'Ste' union all select 'Bldg' union all select 'Building'; select s, cast('<r>'+ replace(s, d, '</r><r>'+d) + '</r>' as xml), [Street1] = cast('<r>'+ replace(s, d, '</r><r>'+d) + '</r>' as xml).value('r[1]', 'varchar(100)'), [Street2] = cast('<r>'+ replace(s, d, '</r><r>'+d) + '</r>' as xml).value('r[2]', 'varchar(100)') from @tab t cross apply @delimiters d where charindex(' '+d+' ', s) > 0;

+1

Nathan skerl Jul 25 '12 at 20:55

source share

 select Addr,CASE WHEN CHARINDEX('suite',addr,1)>0 then LEFT(addr,CHARINDEX('suite',addr,1)-1) WHEN CHARINDEX('Ste',addr,1)>0 then LEFT(addr,CHARINDEX('Ste',addr,1)-1) WHEN CHARINDEX('Bldg',addr,1)>0 then LEFT(addr,CHARINDEX('Bldg',addr,1)-1) WHEN CHARINDEX('Building',addr,1)>0 then LEFT(addr,CHARINDEX('Building',addr,1)-1) END as [Address], CASE WHEN CHARINDEX('suite',addr,1)>0 then RIGHT(addr,len(addr)-(CHARINDEX('suite',addr,1)-1)) WHEN CHARINDEX('Ste',addr,1)>0 then RIGHT(addr,len(addr)-(CHARINDEX('Ste',addr,1)-1)) WHEN CHARINDEX('Bldg',addr,1)>0 then RIGHT(addr,len(addr)-(CHARINDEX('Bldg',addr,1)-1)) WHEN CHARINDEX('Building',addr,1)>0 then RIGHT(addr,len(addr)-(CHARINDEX('Building',addr,1)-1)) END as [Address1] from Addr

+1

Anandphadke Jul 26 '12 at 7:02

source share

If you try to analyze this data and are not separated by something (i.e. a comma), it will be much more complicated and you will have to make some assumptions. Having a larger dataset can help you make stronger assumptions, but it will still be very fragile.

Looking at your data, I think you can make the following assumptions: 1) Address 2 is always the last 2 words (when divided by spaces), so you can split the address based on spaces and use the last 2 as address 2, and the rest as address 1.2) You can assume that address 1 is the first 3 words, and the rest is address 2.

To separate this data, I would either use the T-SQL equivalent of split ('', $ data) to get an array of words. Or use the T-SQL equivalent for strpos and strrpos to search for the second or last space or position of the 3rd space, as well as substr all before and after into the corresponding variables.

It is up to you to make a decision based on available data, to select more reliable assumptions and work with them.

-1

Tim s Jul 25 '12 at 20:51

source share

ErikE · Accepted Answer · 2012-07-25T20:45:16+0000

 SELECT T.Address, Left(T.Address, IsNull(X.Pos - 1, 2147483647)) Address1, Substring(T.Address, X.Pos + 1, 2147483647) Address2 -- Null if no second FROM ( VALUES ('34 Main St Suite 23'), ('435 Center Road Ste 3'), ('34 Jack Corner Bldg 4'), ('2 Some Street Building 345'), ('123 Sterling Rd'), ('405 29th St Bldg 4 Ste 217') ) T (Address) OUTER APPLY ( SELECT TOP 1 NullIf(PatIndex(Delimiter, T.Address), 0) Pos FROM ( VALUES ('% Suite %'), ('% Ste %'), ('% Bldg %'), ('% Building %') ) X (Delimiter) WHERE T.Address LIKE X.Delimiter ORDER BY Pos ) X

I used PatIndex() , so an address like "Sterling Rd" will not give you a false match on "Ste"

Result set:

 Address1 Address2 --------------- -------- 34 Main St Suite 23 435 Center Road Ste 3 34 Jack Corner Bldg 4 2 Some Street Building 345 123 Sterling Rd NULL 405 29th St Bldg 4 Ste 217

Split a string into multiple delimiters

More articles: