Regex \ x96-like characters

I have several rows in a dataset that contain cagar characters

\x96 \x92 

and others.

I can’t understand how to fight for them in R.
I tried using

 pattern="\x96" pattern="\\x96" pattern="x96" 

but to no avail.

Is there a specific way to deal with such characters, in particular in R.


** UPDATE ** as suggested in the comments, perl=TRUE allows grep to work

Can someone offer a solid explanation of what is happening?

session information, if relevant

 > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.9.3 RMySQL_0.9-3 DBI_0.2-5 stringr_0.6.1 data.table_1.8.6 
+4
source share
1 answer

R supports several different types of regular expressions. The default is POSIX ERE (extended regular expressions), which is used by default in grep and other standard posix tools. But the POSIX ERE mechanism in R currently does not support escaping hexadecimal characters:

Resetting non-meta-characters with a backslash is implementation dependent. The current implementation interprets \ a as BEL, \ e as ESC, \ f as FF, \ n as LF, \ r as CR, and \ t as TAB. (Note that they will be interpreted by the R parser in literal character strings.)

See Regular Expressions Used in R.

The perl = TRUE setting modifies the engine used by R to process regular expressions in PCRE (perl-compatible regular expressions). PCRE supports escaped hexadecimal character codes - and voila, your regex now works.

+1
source

All Articles