C ++, cout and UTF-8

Hopefully a simple question: cout seems to die when processing strings ending in a multibyte UTF-8 char, am I doing something wrong? This is from GCC (Mingw) to Win7 x64.

** Edit Sorry, if I wasn’t clear enough, I’m not interested in the missing glyphs or how the bytes are interpreted, but they just don’t appear right after the call to cout << s4 (there is no BAR). Any further cout after the first display does not contain any text!

 #include <cstdio> #include <iostream> #include <string> int main() { std::string s1("abc"); std::string s2("…"); // … = 0xE2 80 A6 std::string s3("…abc"); std::string s4("abc…"); //In C fwrite(s1.c_str(), s1.size(), 1, stdout); printf(" FOO "); fwrite(s2.c_str(), s2.size(), 1, stdout); printf(" BAR "); fwrite(s3.c_str(), s3.size(), 1, stdout); printf(" FOO "); fwrite(s4.c_str(), s4.size(), 1, stdout); printf(" BAR\n\n"); //C++ std::cout << s1 << " FOO " << s2 << " BAR " << s3 << " FOO " << s4 << " BAR "; } // results: // abc FOO     BAR    abc FOO abc… BAR // abc FOO     BAR    abc FOO abc… 
+8
c ++ utf-8 cout
source share
4 answers

This is really not surprising. If your terminal is not set to UTF-8 encoding, as it knows that s2 should not be "(latin small letter a with a rounded line) (Euro sign) (pipe)", suppose your terminal is set to ISO-8859- 1 in accordance with http://www.ascii-code.com/

By the way, cout does not "die", as it obviously continues to produce the result after the test line.

+1
source share

If you want your program to use your current language, call setlocale(LC_ALL, "") as the first thing in your program. Otherwise, the language standard of the C program, and what it will do with non-ASCII characters, cannot be understood by us simply by people.

+4
source share

By default, the Windows console does not process non-local codepage characters.

You need to make sure that a Unicode-compatible font is installed in the console window, and that the code page is set to UTF-8 through a chcp call. However, this is not a guaranteed success. Note that `wcout 'does not change anything if the console cannot display trendy characters because its font is damaged.

On all modern Linux distributions, the console is configured to UTF-8, and this should work out of the box.

0
source share

As others have noted, std::cout is agnostic about this, at least in the "C" locale (by default). On the other hand, your console window should be configured to display UTF-8: code page 65001. Try running chcp 65001 before running your program. (This has worked for me in the past.)

0
source share

All Articles