I noticed that there are two compiler flags in the Visual Studio compiler (for C ++) called MBCS and UNICODE. What is the difference between the two?
Many functions in the Windows API come in two versions: one that accepts char parameters (on a locale-specific code page), and one that accepts wchar_t parameters (in UTF-16).
int MessageBoxA(HWND hWnd, const char* lpText, const char* lpCaption, unsigned int uType); int MessageBoxW(HWND hWnd, const wchar_t* lpText, const wchar_t* lpCaption, unsigned int uType);
Each of these pairs of functions also has a macro without a suffix, which depends on whether the UNICODE macro is defined.
#ifdef UNICODE #define MessageBox MessageBoxW #else #define MessageBox MessageBoxA #endif
To do this, the TCHAR type TCHAR defined to abstract the character type used by the API functions.
#ifdef UNICODE typedef wchar_t TCHAR; #else typedef char TCHAR; #endif
This, however, was a bad idea . You should always explicitly specify the type of character.
I don’t understand how UTF-8 is conceptually different from MBCS encoding?
MBCS stands for Multibyte Character Set. For literal thinking, it seems that UTF-8 will qualify.
But on Windows, "MBCS" refers only to the character encoding that can be used with the "A" version of the Windows API functions. This includes code pages 932 (Shift_JIS), 936 (GBK), 949 (KS_C_5601-1987) and 950 (Big5), but NOT UTF-8.
To use UTF-8, you need to convert the string to UTF-16 using MultiByteToWideChar , call the version of the W function, and call WideCharToMultiByte in the output. In fact, this is what actually performs the “A” function, which makes me wonder why Windows doesn't just support UTF-8 .
This inability to support the most common character encoding makes the "A" version of the Windows API useless. Therefore, you should always use the "W" function .
Unicode is a 16-bit character encoding
This denies everything I read about Unicode.
MSDN is wrong. Unicode is a 21-bit encoded character set that has several encodings, the most common of which are UTF-8, UTF-16, and UTF-32. (There are other Unicode encodings such as GB18030, UTF-7, and UTF-EBCDIC.)
Whenever Microsoft refers to "Unicode", they really mean UTF-16 (or UCS-2). This is for historical reasons. Windows NT was an early Unicode sequence, when 16 bits were considered enough for everyone, and UTF-8 was used only on Plan 9. Thus, UCS-2 was Unicode.