regcomp (from glibc) is a POSIX function for compiling regular expressions.
int regcomp(regex_t *restrict preg, const char *restrict pattern, int cflags);
There are some constructs in regular expressions that depend on the idea of a single character, for example [abc] .
If a multibyte encoding is used and a multibyte letter is used in the expression, the interpretation would be different if it were considered as a byte sequence or a sequence of multibyte letters.
Here I illustrate this idea with grep (which should not be the same in this respect as the C regcomp function):
$ { echo ; echo ; } | egrep '[]' $ { echo ; echo ; } | LANG=C egrep '[]' $
LANG is the default if any of the specific language variables are not set, so the question is: which one will affect the regcomp coding idea.
$ locale LANG=ru_RU.utf8 LC_CTYPE="ru_RU.utf8" LC_NUMERIC="ru_RU.utf8" LC_TIME="ru_RU.utf8" LC_COLLATE="ru_RU.utf8" LC_MONETARY="ru_RU.utf8" LC_MESSAGES=POSIX LC_PAPER="ru_RU.utf8" LC_NAME="ru_RU.utf8" LC_ADDRESS="ru_RU.utf8" LC_TELEPHONE="ru_RU.utf8" LC_MEASUREMENT="ru_RU.utf8" LC_IDENTIFICATION="ru_RU.utf8" LC_ALL= $
regex posix glibc multibyte locale
imz - Ivan Zakharyaschev
source share