The Windows window is pure Unicode. Its buffer stores the text as Unicode UCS-2 (16 bits per character, in fact, like the original Unicode, a restriction on the basic multilingual plane of the modern 21-bit Unicode). Thus, the console window can represent almost all types of text.
However, for one byte per character (and, possibly, also for some variable-length encodings) i / o Windows automatically transfers the active code page to / from the console window. If the console window is an instance of [cmd.exe], then you can check this using the chcp command to change the code page of the change. Like this:
C: \ test> chcp
Active code page: 850
C: \ test> _
Codepage 850 is an encoding based on the IBM PC English 437 source code page. 850 is used by default for console windows on at least a Norwegian PC (although experienced Norwegians can change this to 865). However, these are not code pages that you should use.
The original IBM PC codepage (character encoding) is known as OEM , which is a meaningless acronym for original equipment manufacturers. He had beautiful line drawing characters suitable for the original PC text mode. In general, OEM means the default code page for console windows, where code page 437 is only the source: it can be configured, for example. per window through chcp .
When Microsoft created 16-bit Windows, they chose another encoding known in Windows as ANSI . The source was an extension of ISO Latin-1 , which for a long time was the default on the Internet (however, it is not clear which one was the first: Microsoft participated in standardization). This original ANSI is now known as Windows ANSI Western .
ANSI is the code page used for non-Unicode for almost all other Windows. Console windows use OEM. Notepad, other editors, etc. Use ANSI.
Then, when Microsoft made the 32-bit version of Windows, they adopted the 16-bit Latin-1 extension, known as Unicode . Microsoft was the original founding member of the Unicode Consortium. And the basic API, including console windows, file system, etc., has been rewritten to use Unicode. For backward compatibility, there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functions. For example, MessageBoxA is an ANSI wrapper for Unicode MessageBoxW .
The bottom line is that on Windows, your C ++ source code is usually encoded with ANSI, and console windows are OEM. What, for example, does
cout << "I like Norwegian blåbærsyltetøy!" << endl;
produce pure gobbledegook & hellip; You can use the Unicode-based console window APIs to output Unicode directly to the console window, avoiding translation, but this is inconvenient.
Please note that using wcout instead of cout does not help: by design, wcout simply translates from wide character strings to a narrow-gauge set of programs, discarding information along the way. It's hard to believe that the C ++ standard library offers a rather large chunk of very complex functionality that does not make sense (since instead of these transformations there can only be cout support). But the way it is, it’s just pointless. Perhaps this was some kind of political compromise, but, in any case, wcout does not help, although if it were significant in some way, then it “should” logically help in this.
So, what does a Norwegian novice programmer look like? is "blåbærsyltetøy" represented?
Well, just changing the active code page to ANSI. Since most Western computers in ANSI countries have a code page of 1252, you can do this for a given instance of the command interpreter using
C: \ test> chcp 1252
Active code page: 1252
C: \ test> _
Now, old DOS programs, for example, [edit.com] (still present in Windows XP!) Will lead to some gobbledegook, because the original character character line-drawing characters do not exist in ANSI, and also because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me!
If you want this to be a more permanent code page, you will have to reconfigure the console windows using an undocumented registry key:
HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Control \ Nls \ CodePage
In this key, change the OEMCP value to 1252 and reboot.
As with chcp or other codepage changes before 1252, older DOS programs introduce gobbledegook, but make C ++ or other modern console programs work.
Since then you have the same character encoding in console windows as in the rest of Windows.