What is the difference between printf and std :: ostream under Windows console using UTF-8 output

I have a program that outputs a UTF-8 string to the console:

#include <stdio.h> int main() { printf(" Peace Ειρήνη\n"); return 0; } 

I configure the console to use True Type fonts (Lucida Console), define the UTF-8 code page (chcp 65001), compile this program with both MinGW GCC and Visual Studio 2010, it works fine, I see: output:

  Peace Ειρήνη 

I am doing the same with std::cout

 #include <iostream> int main() { std::cout << " Peace Ειρήνη\n" ; return 0; } 

This works fine as mentioned above using MinGW GCC, but with Visual Studio 2010 I get squares larger than squares (two for each non-ASCII letter).

If I run the program redirecting test >test.txt , I get excellent UTF-8 output in the file.

Both tests are performed on Windows 7.

Questions:

  • What is the difference between printf and std :: cout in the Visual Studio standard library when processing the output stream - is it obvious that one of them works and the other doesn't?
  • How can this be fixed?

The real answer is:

In short: you're screwed up - std::cout doesn't actually work with MSVC + UTF-8 - or at least it takes a lot of effort to make it behave wisely.

In a long: read two articles referenced by the answer.

+7
source share
1 answer

You have a number of erroneous assumptions; first correct them:

  • What seems to work with g ++ does not mean that g ++ is working correctly.

  • Visual Studio is not a compiler, it is an IDE that supports many languages ​​and compilers.

  • The conclusion that the Visual C ++ standard library needs to be fixed is correct, but the argument leading to this conclusion is incorrect. The g ++ standard library should also be fixed. Not to mention the g ++ compiler.

Visual C ++ now has Windows ANSI, the encoding specified by the GetACP API GetACP , as its undocumented C ++ character set. Even if your source code is UTF-8 with BOM, narrow lines will eventually translate to Windows ANSI. If it is on your computer at compile time, this is a code page containing all non-ASCII characters and then OK, but otherwise narrow lines will be distorted. Therefore, the description of your test results is seriously incomplete without mentioning the source code encoding and your Windows ANSI code page.

But in any case: "If I run the program with the test >test.txt , I get an excellent UTF-8 output in the file," that means you are nasty, this is a bit of C ++ level help from the Visual C ++ runtime where it bypasses the stream output and uses the direct console output to get the correct characters displayed in the console window.

This help leads to garbage when its assumptions, such as those encoded by narrow string literals, encoded in ANSI format, are not preserved.

It also means that the effect mysteriously disappears when the flow is redirected. Then, the runtime library detects that the stream goes to the file and disables the direct output to the console. You cannot get the original original byte values, but apparently you did it, which was a failure, because it masked the problem.

By the way, the code page 65001 in the console in Windows is not applicable in practice. Many programs just crash. Including, for example, more .


One way to get the right result is to directly use the Windows API level with direct console output.

Getting the right output with C ++ streams is a lot harder.

It is so complicated that there is no room for a description (right!), So I should instead refer you to my article from the 2-part blog article about it: Part 1 and Part 2 .

+1
source

All Articles