You can write UTF-8/16/32 string literals in C ++ 11, the string literal prefix with u8 / u / u respectively. How should the compiler interpret a UTF-8 file with non-ASCII characters inside these new types of string literals? I understand that the standard does not specify file encodings, and this fact would allow to fully interpret non-ASCII characters inside undefined source code, making this function a little less useful.
I understand that you can still escape single Unicode characters using \uNNNN , but this is not very readable, say, for a complete Russian or French sentence, which usually contains more than one Unicode character.
What I understand from different sources is that u should become equivalent to L for current Windows implementations and u e.g. Linux implementations. Therefore, keeping this in mind, I am also interested in what is required for old string literal modifiers ...
For monkeys with a sample code:
string utf8string a = u8"L'hôtel de ville doit être là-bas. Ça c'est un fait!"; string utf16string b = u"L'hôtel de ville doit être là-bas. Ça c'est un fait!"; string utf32string c = U"L'hôtel de ville doit être là-bas. Ça c'est un fait!";
In an ideal world, all these lines produce the same content (for example, characters after conversion), but my experience with C ++ taught me that this is definitely a specific implementation, and maybe only the first one will do what I want,
c ++ encoding c ++ 11 string-literals
rubenvb Jul 22 '11 at 18:40 2011-07-22 18:40
source share