How to work with Unicode strings in C / C ++ in a cross-platform form?

On non-Windows platforms, you can easily use char * strings and treat them as UTF-8.

The problem is that on Windows you have to receive and send messages using wchar * (W) strings. If you use the ANSI (A) functions, you will not support Unicode.

So, if you want to write a truly portable application, you need to compile it as Unicode on Windows.

Now, to keep the code clean, I would like to see what is the recommended way to work with strings, a way that minimizes ugliness in the code.

The type of strings you may need: std::string , std::wstring , std::tstring , char * , wchat_t * , TCHAR* , CString (ATL one).

Problems you may encounter:

  • cout/cerr/cin and their variants Unicode wcout,wcerr,wcin
  • all renamed wide string functions and their TCHAR macros - for example strcmp , wcscmp and _tcscmp .
  • constant lines inside the code, with TCHAR you will need to populate your code with _T() macros.

Which approach do you think is best? (examples are welcome)

Personally, I would like to use the std::tstring , but I would like to see how to do this with conversions where they are needed.

+7
string windows cross-platform unicode tchar
source share
3 answers

I can suggest you check this library: http://cppcms.sourceforge.net/boost_locale/docs/
It could help, this is an accelerating candidate, but I believe that it will.

+3
source share

You can save all your UTF-8 strings in encoding and just convert them to UTF-16 before interacting with the WIn32 API. Take a look at UTF8-CPP for some easy-to-use conversion functions

+1
source share

If you are writing portable code:

1st Never use wchar_t , it is not portable and its encoding is not defined between platforms (utf-16 windows / utf-32 all the rest).

Never use TChar, use plain std::string encoded as UTF-8.

When you work with the Win32 API with brain damage, just convert the UTF-8 string to UTF-16 before calling it.

See https://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful , and how a Windows project uses UTF-8 as its primary encoding.

+1
source share

All Articles