Display extended ASCII characters

Question

Display extended ASCII characters

In Visual Studio 2005 on 32-bit Windows, why doesn't my console display characters from 128 to 255?

eg:

cout << "¿" << endl; //inverted question mark

Output:

 ┐ Press any key to continue . . .

+6

c ++ x86 windows visual-studio-2005

user3234 Feb 03 '11 at 2:08

source share

4 answers

When you print an ASCII string, Windows internally converts it to UNICODE based on the current code page. There is also a translation from UNICODE to "ASCII" made by CRT. The following will work.

 #include <fcntl.h> #include <io.h> #include <stdio.h> #include <iostream> void __cdecl main(int ac, char **av) { _setmode(_fileno(stdout), _O_U16TEXT); std::wcout << L"\u00BF"; }

+3

John Feb 03 '11 at 2:48

source share

Because the Win32 console uses page code 437 (also known as an OEM font) to render characters, while most other Windows use Windows-1252 for single-byte character codes.

The "¿" character is a Unicode INVERTED QUESTION MARK character, which has the code 0xBF (191 decimal places) in Unicode, ISO 8859-1 and Windows-1252. The code point 0xBF in CP437 corresponds to the symbol "┐", which is a LIGHT DOWN AND LEFT DRAWING BOX (code point U + 2510).

As long as you use the Windows console, you can only display characters in CP437 and others. If you want to display other Unicode characters, you need to use a different environment.

+2

Adam rosenfield Feb 03 '11 at 4:10

source share

It is probably implemented using the ascii base character set. Microsoft programmers did not add utf-8 capabilities when creating the console. Just an assumption, since I was not a Microsoft programmer involved in creating the console.

0

JK. Feb 03 '11 at 2:12

source share

Cheers and hth. · Accepted Answer · 2011-02-03T08:27:17+0000

The Windows window is pure Unicode. Its buffer stores the text as Unicode UCS-2 (16 bits per character, in fact, like the original Unicode, a restriction on the basic multilingual plane of the modern 21-bit Unicode). Thus, the console window can represent almost all types of text.

However, for one byte per character (and, possibly, also for some variable-length encodings) i / o Windows automatically transfers the active code page to / from the console window. If the console window is an instance of [cmd.exe], then you can check this using the chcp command to change the code page of the change. Like this:

  C: \ test> chcp
 Active code page: 850

 C: \ test> _

Codepage 850 is an encoding based on the IBM PC English 437 source code page. 850 is used by default for console windows on at least a Norwegian PC (although experienced Norwegians can change this to 865). However, these are not code pages that you should use.

The original IBM PC codepage (character encoding) is known as OEM , which is a meaningless acronym for original equipment manufacturers. He had beautiful line drawing characters suitable for the original PC text mode. In general, OEM means the default code page for console windows, where code page 437 is only the source: it can be configured, for example. per window through chcp .

When Microsoft created 16-bit Windows, they chose another encoding known in Windows as ANSI . The source was an extension of ISO Latin-1 , which for a long time was the default on the Internet (however, it is not clear which one was the first: Microsoft participated in standardization). This original ANSI is now known as Windows ANSI Western .

ANSI is the code page used for non-Unicode for almost all other Windows. Console windows use OEM. Notepad, other editors, etc. Use ANSI.

Then, when Microsoft made the 32-bit version of Windows, they adopted the 16-bit Latin-1 extension, known as Unicode . Microsoft was the original founding member of the Unicode Consortium. And the basic API, including console windows, file system, etc., has been rewritten to use Unicode. For backward compatibility, there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functions. For example, MessageBoxA is an ANSI wrapper for Unicode MessageBoxW .

The bottom line is that on Windows, your C ++ source code is usually encoded with ANSI, and console windows are OEM. What, for example, does

 cout << "I like Norwegian blåbærsyltetøy!" << endl;

produce pure gobbledegook & hellip; You can use the Unicode-based console window APIs to output Unicode directly to the console window, avoiding translation, but this is inconvenient.

Please note that using wcout instead of cout does not help: by design, wcout simply translates from wide character strings to a narrow-gauge set of programs, discarding information along the way. It's hard to believe that the C ++ standard library offers a rather large chunk of very complex functionality that does not make sense (since instead of these transformations there can only be cout support). But the way it is, it’s just pointless. Perhaps this was some kind of political compromise, but, in any case, wcout does not help, although if it were significant in some way, then it “should” logically help in this.

So, what does a Norwegian novice programmer look like? is "blåbærsyltetøy" represented?

Well, just changing the active code page to ANSI. Since most Western computers in ANSI countries have a code page of 1252, you can do this for a given instance of the command interpreter using

  C: \ test> chcp 1252
 Active code page: 1252

 C: \ test> _

Now, old DOS programs, for example, [edit.com] (still present in Windows XP!) Will lead to some gobbledegook, because the original character character line-drawing characters do not exist in ANSI, and also because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me!

If you want this to be a more permanent code page, you will have to reconfigure the console windows using an undocumented registry key:

HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Control \ Nls \ CodePage

In this key, change the OEMCP value to 1252 and reboot.

As with chcp or other codepage changes before 1252, older DOS programs introduce gobbledegook, but make C ++ or other modern console programs work.

Since then you have the same character encoding in console windows as in the rest of Windows.

Display extended ASCII characters

More articles: