The string is divided by the last comma in R

I am not new to R, but I am referring to a new regex.

A similar question can be found in here .

An example is the use of

> strsplit("UK, USA, Germany", ", ") [[1]] [1] "UK" "USA" "Germany" 

but i want to get

 [[1]] [1] "UK, USA" "Germany" 

Another example is

 > strsplit("London, Washington, DC, Berlin", ", ") [[1]] [1] "London" "Washington" "DC" "Berlin" 

and i want to get

 [[1]] [1] "London, Washington, DC" "Berlin" 

Specifically, Washington, DC should not be divided into two parts, and should only be divided by the last comma , not each comma.

One viable way, I think, is to replace the last comma with something else, like

 $, #, *, ... 

then use

 strsplit() 

to split the string into the one you replaced (make sure it's unique!), but I'm happier if you can handle the problem using the built-in function directly.

So how can I do this? many thanks

+8
string split r comma
source share
2 answers

Here is one approach:

 strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE) ## [[1]] ## [1] "UK, USA" " Germany" 

You can:

 strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE) ## [[1]] ## [1] "UK, USA" "Germany" 

Since it will match if there is no space after the comma:

 strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE) ## [[1]] ## [1] "UK, USA" "Germany" ## ## [[2]] ## [1] "UK, USA" "Germany" 
+9
source share

You can use the stri_split function from stringi package

 x <- "USA,UK,Poland" stri_split_fixed(x,",") # standard split by comma [[1]] [1] "USA" "UK" "Poland" stri_split_fixed(x,",",n = 2) # set the max number of elements [[1]] [1] "USA" "UK,Poland" 

Unfortunately, there is no parameter for changing the starting point for splitting (from the beginning / end), but we can deal with this in another way - using stri_reverse

 stri_split_fixed(stri_reverse(x),",",n = 2) #reverse [[1]] [1] "dnaloP" "KU,ASU" stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]]) #reverse back [1] "Poland" "USA,UK" stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]])[2:1] #and again :) [1] "USA,UK" "Poland" 
+5
source share

All Articles