Printing Unicode C ++ characters

I am trying to write a simple command line application to teach myself Japanese, but I can not get Unicode characters to print. What am I missing?

#include <iostream> using namespace std; int main() { wcout << L"γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n"; wcout << L"Hello World\n" system("pause"); } 

In this example, only β€œPress any key to continue” is displayed. Tested in Visual C ++ 2013.

+7
c ++ unicode
source share
3 answers

It is not so easy on Windows. Even when you manage to get text on the Windows console, you still need to configure cmd.exe to display Japanese characters.


 #include <iostream> int main() { std::cout << "γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n"; } 

This works great on any system where:

  • Source and compiler runtime encodings include characters.
  • The output device (for example, the console) expects the text in the same encoding as the encoding of the compiler execution.
  • A font with appropriate characters is available (usually this is not a problem).

Most platforms today use UTF-8 by default for all of these encodings and therefore can support the entire Unicode range with code similar to the one above. Unfortunately, Windows is not one of these platforms.

 wcout << L"γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n"; 

On this line, the string literal data (at compile time) is converted from the source encoding to wide-encoding, and then (at run time) wcout uses the locale in which it is embedded to convert the wchar_t data to char data for output. Where everything went wrong, the default locale is only required to support characters from the base character set of the source, which doesn't even include all ASCII characters, not to mention non-ASCII characters.

Thus, the conversion results in an error, putting wcout in a bad state. The error must be cleared before wcout functions again, so the second print request does not print anything.


You can get around this for a limited range of characters by creating a wcout language version that successfully converts characters. Unfortunately, the encoding needed to support the entire Unicode range is UTF-8; Although Microsoft's threading implementation supports other multibyte encodings, it very specifically does not support UTF-8.

For example:

 wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>())); SetConsoleOutputCP(CP_UTF8); wcout << L"γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n"; 

Here wcout will correctly convert the string to UTF-8, and if the output was written to a file instead of the console, then the file will contain the correct UTF-8 data. However, the Windows console, although configured here to receive UTF-8 data, simply will not accept UTF-8 data written this way.


There are several options:

  • Avoid the standard library:

     DWORD n; WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n", 8, &n, nullptr); 
  • Use a non-standard magic spell that breaks the standard code:

     #include <fcntl.h> #include <io.h> _setmode(_fileno(stdout), _O_U8TEXT); std::wcout << L"γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n"; 

    After setting this mode std::cout << "Hello, World"; will fail.

  • Use the low I / O API and manual conversion:

     #include <codecvt> #include <locale> SetConsoleOutputCP(CP_UTF8); std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert; std::puts(convert.to_bytes(L"γ“γ‚“γ«γ‘γ―δΈ–η•Œ\n")); 

Using any of these methods, cmd.exe will display the correct text to the best of its ability, and I mean that it will display unreadable fields. Seven small boxes for a given string.

Little boxes

You can copy the text from cmd.exe to the notepad.exe file or something else to see the correct glyphs.

+5
source share

The whole article about working with Unicode in the Windows console

http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

Basically, you can implement your own streambuf for std::cout (or std::wcout ) from the point of view of WriteConsoleW and enjoy writing UTF-8 (or any other Unicode you want) in the Windows console, depending on the locales, console code pages and even without the use of wide characters.
It may not look very simple, but it is a convenient and reusable solution that can also provide you with portable user code utf8-everywhere. Please don't beat me for your English :)

+3
source share

Or you can change the Windows locale to Japanese.

-one
source share

All Articles