R has several special language character classes for regular expressions.
From ?regex:
'[[: alnum:]] means "[0-9A-Za-z], with the exception of the latter, it depends on the language and character encoding, while the former does not depend on the language and character set.
I would like to know when problems related to locality can occur.
I tried two examples based on the information on the help page ?Comparison, which describes the sort order of strings:
in Estonian, Z is between 'S and' T
and
in Danish, aa is sorted as one letter, after "z
In the first example, I would expect that T, U, V, W, X, and Y do not match. In the second example, I would expect it to not match.
Sys.setlocale("LC_ALL", "Estonian")
grepl("[A-Z]", LETTERS)
Sys.setlocale("LC_ALL", "Danish")
grepl("[a-z]", "aa")
TRUE, , .
, locale , [a-z]?
: : -, [a-zA-Z] vs. [[:alpha:]]. , , , .