Saving Unicode UTF-8 string to std :: string

In response to the discussion in

Cross-platform strings (and Unicode) in C ++

How to work with Unicode strings in C / C ++ in cross-platform mode?

I am trying to assign a UTF-8 std::string to the std::string variable in Visual Studio 2010

std::string msg = "महसुस";

However, when I look at the line debugger, I only see "?????" I have a file saved as Unicode (UTF-8 with signature) and I use the "use Unicode character set" character set

"महसुस" is Nepali and contains 5 characters and will occupy 15 bytes. But visual studio debugger shows msg size as 5

My question is:

How to use std :: string only to store utf-8 without the need to manipulate it ?

+7
c ++ stdstring windows unicode utf-8
source share
5 answers

If you used C ++ 11, this would be easy:

 std::string msg = u8"महसुस"; 

But since you will not do this, you can use escape sequences and not rely on the encoding of the source file to control the encoding for you, this way your code will be more portable (if you accidentally save it in a format other than UTF8)

 std::string msg = "\xE0\xA4\xAE\xE0\xA4\xB9\xE0\xA4\xB8\xE0\xA5\x81\xE0\xA4\xB8"; // "महसुस" 

Otherwise, you can instead perform the conversion at runtime:

 std::string toUtf8(const std::wstring &str) { std::string ret; int len = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0, NULL, NULL); if (len > 0) { ret.resize(len); WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len, NULL, NULL); } return ret; } 

 std::string msg = toUtf8(L"महसुस"); 
+8
source share

If you have C ++ 11, you can write u8"महसुस" . Otherwise, you will need to write the actual byte sequence using \x xx for each byte in the UTF-8 sequence.

Generally, you are better off reading such text from a configuration file.

+4
source share

You can write msg.c_str(), s8 in the clock window to see the UTF-8 line correctly.

+3
source share

There is a way to display the correct values ​​thanks to the 's8' format specifier. If we add ', s8' to the variable names, Visual Studio redraws the text in UTF-8 and displays the text correctly:

If you are using Microsoft Visual Studio 2008 Service Pack 1 (SP1), you must install the hotfix.

http://support.microsoft.com/kb/980263

+1
source share

If you set the system locale to English, and the file is in UTF-8 without a specification, VC allows you to save the string as is. I wrote an article about it here.

enter image description here

+1
source share

All Articles