Is \ 0 ("\\ 0" in a C-style regular expression string) a valid escape sequence in C ++ regular expressions?

NOTE When I say regex [\0] , I mean the regular expression [\0] (not contained in the C style string, which would then be "[\\0]" ). If I did not put quotation marks around it, this is not a C style string, and the backslash should not be interpreted as escaping a C style string.

Inspired by this question and my research , I tried the following code in clang 3.4:

 #include <regex> #include <string> int main() { std::string input = "foobar"; std::regex regex("[^\\0]*"); // Note, this is "\\0", not "\0"! return std::regex_match(input, regex); } 

Apparently, clang doesn't like this because it throws:

std::__1::regex_error : the expression contained an invalid escape character or trailing escape code.

This seems to be part of [^\0] (changing it to [^\n] or something similar works fine). This seems to be an invalid escape character. I want to clarify that I'm not talking about the character '\0' (null character) or '\n' (newline character). In C-style strings, I'm talking about "\\0" (a string containing a backslash) and "\\n" (a string containing a backslash n). "\\n" seems to be converted to "\n" using the regex engine, but it chokes on "\\0" .

The C ++ 11 standard says in section 28.13 [re.grammar] that:

The regular expression grammar recognized by basic_regex objects built with the ECMAScript flag is specified in ECMA-262, except as noted below.

I'm not an ECMA-262 expert, but I tried the regular expression in JSFiddle , and it works fine in JavaScript land.

So now I'm wondering if the regular expression [^\0] is acceptable in ECMA-262 and the standard C ++ 11 version for it (in the material following ... except as specified below. ).

Question Is \0 (not a null character, in a string literal it will be "\\0" ) an escape sequence legal in a C ++ 11 regular expression? Is this legal in ECMA-262 (or are the JS VM browsers just "too" lenient)? What is the reason / excuse for different behaviors?

+8
c ++ javascript regex ecma262 c ++ 11
source share
1 answer

This was a bug in the implementation of libC ++ <regex> . Now it should be fixed in the trunk, and this should eventually extend to the OS X release code.

In addition, here is an excerpt from the ECMA 262 standard, which is the basis for the error report:

10/15/2.11 DecimalEscape

The production of DecimalEscape :: DecimalIntegerLiteral [lookahead ∉ DecimalDigit] is evaluated as follows:

  • Let me be the MV of the decimal key. Literature.
  • If I am zero, return an EscapeValue consisting of a <NUL> character (Unicode value 0000).
  • Returns an EscapeValue consisting of the integer i.

Note: ... \ 0 is a <NUL> character and cannot follow a decimal digit.

+2
source share

All Articles