Printing Unicode C ++ characters

Question

Printing Unicode C ++ characters

I am trying to write a simple command line application to teach myself Japanese, but I can not get Unicode characters to print. What am I missing?

#include <iostream> using namespace std; int main() { wcout << L"こんにちは世界\n"; wcout << L"Hello World\n" system("pause"); }

In this example, only “Press any key to continue” is displayed. Tested in Visual C ++ 2013.

+7

c ++ unicode

jeffythedragonslayer Sep 19 '13 at 20:14

source share

3 answers

bames53 · Answer 1 · 2013-09-19T23:00:28+0000

It is not so easy on Windows. Even when you manage to get text on the Windows console, you still need to configure cmd.exe to display Japanese characters.

 #include <iostream> int main() { std::cout << "こんにちは世界\n"; }

This works great on any system where:

Source and compiler runtime encodings include characters.
The output device (for example, the console) expects the text in the same encoding as the encoding of the compiler execution.
A font with appropriate characters is available (usually this is not a problem).

Most platforms today use UTF-8 by default for all of these encodings and therefore can support the entire Unicode range with code similar to the one above. Unfortunately, Windows is not one of these platforms.

 wcout << L"こんにちは世界\n";

On this line, the string literal data (at compile time) is converted from the source encoding to wide-encoding, and then (at run time) wcout uses the locale in which it is embedded to convert the wchar_t data to char data for output. Where everything went wrong, the default locale is only required to support characters from the base character set of the source, which doesn't even include all ASCII characters, not to mention non-ASCII characters.

Thus, the conversion results in an error, putting wcout in a bad state. The error must be cleared before wcout functions again, so the second print request does not print anything.

You can get around this for a limited range of characters by creating a wcout language version that successfully converts characters. Unfortunately, the encoding needed to support the entire Unicode range is UTF-8; Although Microsoft's threading implementation supports other multibyte encodings, it very specifically does not support UTF-8.

For example:

 wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>())); SetConsoleOutputCP(CP_UTF8); wcout << L"こんにちは世界\n";

Here wcout will correctly convert the string to UTF-8, and if the output was written to a file instead of the console, then the file will contain the correct UTF-8 data. However, the Windows console, although configured here to receive UTF-8 data, simply will not accept UTF-8 data written this way.

There are several options:

Avoid the standard library:

 DWORD n; WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);

Use a non-standard magic spell that breaks the standard code:

 #include <fcntl.h> #include <io.h> _setmode(_fileno(stdout), _O_U8TEXT); std::wcout << L"こんにちは世界\n";

After setting this mode std::cout << "Hello, World"; will fail.

Use the low I / O API and manual conversion:

 #include <codecvt> #include <locale> SetConsoleOutputCP(CP_UTF8); std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert; std::puts(convert.to_bytes(L"こんにちは世界\n"));

Using any of these methods, cmd.exe will display the correct text to the best of its ability, and I mean that it will display unreadable fields. Seven small boxes for a given string.

You can copy the text from cmd.exe to the notepad.exe file or something else to see the correct glyphs.

user2665887 · Answer 2 · 2013-09-19T23:03:46+0000

The whole article about working with Unicode in the Windows console

http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

Basically, you can implement your own streambuf for std::cout (or std::wcout ) from the point of view of WriteConsoleW and enjoy writing UTF-8 (or any other Unicode you want) in the Windows console, depending on the locales, console code pages and even without the use of wide characters.
It may not look very simple, but it is a convenient and reusable solution that can also provide you with portable user code utf8-everywhere. Please don't beat me for your English :)

zettsett · Answer 3 · 2013-11-03T23:48:33+0000

Or you can change the Windows locale to Japanese.

Printing Unicode C ++ characters

More articles: