Why doesn't \\ b work in gsubfn in R for me?

I have a line like this:

vect <- c("Thin lines are not great, I am in !!! AND You shouldn't be late OR you loose") 

I want to replace "in" with% in% "," AND "with" & "," OR "with" | "

I know that this can be done with gsub as shown below:

 gsub("\\bin\\b","%in%", vect), 

but I need three different lines for each replacement, so I prefer using gsubfn .

so I tried

 gsubfn("\\bin\\b|\\bAND\\b|\\bOR\\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect) 

but it returns the string unchanged, for some reason \\b does not work for the string. However, \\b works fine with gsub , and I can replace all three lines inside by concatenating them together using gsub .

My question is: why \\b does not work inside gsubfn . What am I missing inside my regex?

Please, help.

The output should be:

 "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose" 

It works:

 gsubfn("\\w+", list("in"="%in%", "AND"= "&", "OR"="|"), vect) 
+7
regex r gsubfn
source share
2 answers

The default is the Tcl regex engine, see gsubfn docs :

If the R installation has tcltk capability, then tcl is used if FUN is not a proto-object or perl=TRUE , in which case the "R" mechanism is used (regardless of the setting of this argument).

So, word boundaries are determined using \y :

 > gsubfn("\\y(in|AND|OR)\\y", list("in"="%in%", "AND"= "&", "OR"="|"), vect) [1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose" 

Another way is to use \m as the leading word boundary and \m for the final word boundary:

 > gsubfn("\\m(in|AND|OR)\\M", list("in"="%in%", "AND"= "&", "OR"="|"), vect) [1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose" 

You can pass perl=TRUE and use \b :

 > gsubfn("\\b(in|AND|OR)\\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl=TRUE) [1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose" 
+6
source share

Add perl = T , which should do this.

 gsubfn("\\bin\\b|\\bAND\\b|\\bOR\\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl =T) 

Exit

 [1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose" 

From gsub documentation

The gsub and gregexpr mode of POSIX 1003.2 does not work correctly with repeated word boundaries (for example, pattern = "\ b"). Use perl = TRUE for such matches (but this may not work as expected with non-ASCII inputs since the meaning of the word is system dependent).

And the gsubfn documentation

... Other arguments to gsub.

Does not explain why gsub works fine without the perl argument, but perl=T is required to run gsubfn

+4
source share

All Articles