I’ve been digging a spec for a long time and can’t find any final offers for yes / no support.
Does the following instruction:
char16_t *s = u"asdf";
to imply / ensure that the string literal "asdf" must be encoded in UTF-16?
Of all that I can deduce, yes.
However, in this n2018 sentence , it says only when __STDC_UTF_16__ defined that char16_t literals are UTF-16 encoded, so the leaves open the door when __STDC_UTF_16__ is undefined, char16_t literals can be encoded anyway, the compiler wants.
After all, the standard only guarantees the size, signature, and basic representation of char16_t ; it says nothing about how the compiler should encode a literal or text literal char16_t .
The specification states
The size of the string literal char16_t is the total number of escapements of the sequence, the names of universal characters and other characters, plus one for each character requiring a surrogate pair, plus one for ending and \ 0. [Note: string literal size char16_t number of code units, not the number of characters. -end note]
This means that it is understood that char16_t string literals are encoded by UTF16 because the “surrogate pair” is a concept of UTF-16.
Let me know if there is anything vague in the question.
source share