Is it illegal to directly embed unicode in an alphabetic character instead of using the name of a universal character?

According to ISO / IEC 14882: 2011 (ยง2.14.3), a character literal, also called constants, is illustrated below.

character-literal: ' c-char-sequence ' u' c-char-sequence ' U' c-char-sequence ' L' c-char-sequence ' ... c-char: any member of the source character set except the single-quote ', backslash \, or new-line character escape-sequence universal-character-name 

At first glance, it seems that directly embedding in unicode instead of using a universal symbolic name in a symbolic literal is illegal. However, most compilers like g ++ and visual studio C ++ do not bother at all, which is somewhat confusing. Each implementation automatically converts these Unicode to the name of a universal character before starting compilation, regardless of the standard?

+7
c ++ c ++ 11 unicode
source share
2 answers

I think the first "translation phase" handles this (C ++ 11 2.2 / 1: 1.):

Any symbol of the source file that does not contain the main character set (2.3) is replaced with the name of the universal symbol that denotes this symbol.

So, your input files are encoded in the original character set, which includes the main character set of the source, but in the program text all non-essential characters are replaced by their universal symbol name.

+5
source share

Is defined:

ยง 2.2. Translation Phases:

  • [...] The character set of the physical source file is adopted in accordance with the implementation. [...] Any character of the source file is not basically the source character set (2.3) is replaced by the name of the universal character that denotes this character. (An implementation can use any internal encoding if the actual extended character is found in the source file and the same extended character expressed in the source file as the name of the universal character (i.e. using the notation \ uXXXX) is treated equivalently [... ].)
+3
source share

All Articles