C / C ++ Coding Issues

Question

C / C ++ Coding Issues

I have a few questions trying to understand different encodings.

What is the default encoding for strings?

char ascii[]= "Some text"; // This is plain ASCII right? wchar_t utf[] = L"Some Text"; // Is this UTF-16? Or ASCII stored in wchar_t's? MessageBoxW(NULL, L"Hello", L"HI", MB_OK); // What encodings are the 2 strings in?

And then, how would I create a UTF-8 string? If I wanted to display UTF-8 characters in a MessageBox?

My questions are mainly directed to Windows, by the way, but if they are different in different OSs, I am interested to know.

+7

c ++ string encoding unicode

Josh Mar 15 '12 at 5:20

source share

1 answer

Jerry Coffin · Accepted Answer · 2012-03-15T05:25:58+0000

The standard does not specify the encoding for narrow or wide strings. Typically, the supplier is committed to something that is not surprising on the target machine, but hard to say more. This means, for example, that a narrow string is likely to use ASCII (or, indeed, something like ISO-8859) on most personal computers, but EBCDIC on the IBM mainframe.

Wide character strings also vary - for example, most compilers on Windows will use UTF-16. On Linux, UTF-32 / UCS-4 is probably more common.

Mentioning MessageBox offers Windows, where (you guessed it) you will usually have UTF-16 for wide lines. In this case, if you explicitly specify wide strings, you also want to explicitly specify the wide version of the function - MessageBoxW .

Regarding the creation of the UTF-8 string literal, all I can say is “luck”. This will be related to Visual Studio, but if there is a way to do this, I don't know about that.

C / C ++ Coding Issues

More articles: