Is it illegal to directly embed unicode in an alphabetic character instead of using the name of a universal character?

Question

Is it illegal to directly embed unicode in an alphabetic character instead of using the name of a universal character?

According to ISO / IEC 14882: 2011 (§2.14.3), a character literal, also called constants, is illustrated below.

character-literal: ' c-char-sequence ' u' c-char-sequence ' U' c-char-sequence ' L' c-char-sequence ' ... c-char: any member of the source character set except the single-quote ', backslash \, or new-line character escape-sequence universal-character-name

At first glance, it seems that directly embedding in unicode instead of using a universal symbolic name in a symbolic literal is illegal. However, most compilers like g ++ and visual studio C ++ do not bother at all, which is somewhat confusing. Each implementation automatically converts these Unicode to the name of a universal character before starting compilation, regardless of the standard?

+7

c ++ c ++ 11 unicode

user3647351 May 17, '14 at 14:08

source share

2 answers

Is defined:

§ 2.2. Translation Phases:

[...] The character set of the physical source file is adopted in accordance with the implementation. [...] Any character of the source file is not basically the source character set (2.3) is replaced by the name of the universal character that denotes this character. (An implementation can use any internal encoding if the actual extended character is found in the source file and the same extended character expressed in the source file as the name of the universal character (i.e. using the notation \ uXXXX) is treated equivalently [... ].)

+3

Oberon May 17, '14 at 14:21

source share

Kerrek SB · Accepted Answer · 2014-05-17T14:20:38+0000

I think the first "translation phase" handles this (C ++ 11 2.2 / 1: 1.):

Any symbol of the source file that does not contain the main character set (2.3) is replaced with the name of the universal symbol that denotes this symbol.

So, your input files are encoded in the original character set, which includes the main character set of the source, but in the program text all non-essential characters are replaced by their universal symbol name.

Is it illegal to directly embed unicode in an alphabetic character instead of using the name of a universal character?

More articles: