Confusingly with locale settings in R

Question

Confusingly with locale settings in R

I just answered this Deleting characters after the EURO character in R. But this does not work for me, where the r code works for others that are on Ubuntu.

This is my code.

x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro" euro <- "\u20AC" gsub(paste(euro , "(\\S+)|."), "\\1", x) # ""

I think it’s all about changing the locale settings, I don’t know how to do it.

I am running rstudio on Windows 8.

 > sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.2.0

@ The answer to Aanada is good, but we need to add this encoding parameter for every time we use unicodes in regex. Is there a way to change the default encoding to utf-8 on Windows?

+6

regex r

Avinash raj Jul 08 '15 at 9:53

source share

1 answer

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2015-07-08T09:57:12+0000

There seems to be a problem with the encoding.

Consider:

 x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro" gsub(paste(euro , "(\\S+)|."), "\\1", x) # [1] "" gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8")) # [1] "15,896.80"

Confusingly with locale settings in R

More articles: