Gsub is only part of the template

I want to use gsub to fix some names that are in my data. I need names like "RJ" and "AJ" to not have a space between letters.

For instance:

x <- "AJ Burnett" 

I want to use gsub to match the pattern of its name, and then remove the space:

 gsub("[AZ]\\.\\s[AZ]\\.", "[AZ]\\.[AZ]\\.", x) 

But I get:

 [1] "[AZ].[AZ]. Burnett" 

Obviously, instead of [AZ], I need the actual letters in the original name. How can i do this?

+5
source share
2 answers

Use capture groups by placing patterns in (...) and referencing captured patterns with \\1 , \\2 , etc. In this example:

 x <- "AJ Burnett" gsub("([AZ])\\.\\s([AZ])\\.", "\\1.\\2.", x) [1] "AJ Burnett" 

Also note that when replacing you do not need to avoid characters . , since they do not have much significance there.

+6
source

You can use the forecast ( (?=\\w\\.) ) And look-behind ( (?<=\\b\\w\\.) ) To target such spaces and replace them with "".

 x <- c("AJ Burnett", "Dr. RJ Regex") gsub("(?<=\\b\\w\\.) (?=\\w\\.)", "", x, perl = TRUE) # [1] "AJ Burnett" "Dr. RJ Regex" 

Looking ahead corresponds to the word character ( \\w ), followed by a period ( \\. ), And look-behind corresponds to the word boundary ( \\b ), followed by the word character and period.

+1
source

All Articles