With C ++ 11, there are additional standard codecvt specializations and types designed to convert between different UTF-x and UCSx character sequences; One of them can satisfy your needs.
In <locale> :
std::codecvt<char16_t, char, std::mbstate_t> : converts between UTF-16 and UTF-8.std::codecvt<char32_t, char, std::mbstate_t> : converts between UTF-32 and UTF-8.
In <codecvt> :
std::codecvt_utf8_utf16<typename Elem> : converts between UTF-8 and UTF-16, where UTF-16 code points are stored as the specified Elem (note that if char32_t specified, only one code point will be saved per char32_t ).- It has two additional default template parameters (
unsigned long MaxCode = 0x10ffff and std::codecvt_mode Mode = (std::codecvt_mode)0 ) and inherits from std::codecvt<Elem, char, std::mbstate_t> .
std::codecvt_utf8<typename Elem> : converts between UTF-8 and UCS2 or UCS4, depending on Elem (UCS2 for char16_t , UCS4 for char32_t , depends on the platform for wchar_t ).- It has two additional default template parameters (
unsigned long MaxCode = 0x10ffff and std::codecvt_mode Mode = (std::codecvt_mode)0 ) and inherits from std::codecvt<Elem, char, std::mbstate_t> .
std::codecvt_utf16<typename Elem> : converts between UTF-16 and UCS2 or UCS4, depending on Elem (UCS2 for char16_t , UCS4 for char32_t , depends on the platform for wchar_t ).- It has two additional default template parameters (
unsigned long MaxCode = 0x10ffff and std::codecvt_mode Mode = (std::codecvt_mode)0 ) and inherits from std::codecvt<Elem, char, std::mbstate_t> .
codecvt_utf8 and codecvt_utf16 will convert between the specified UTF and UCS2 or UCS4, depending on the size of Elem . Thus, wchar_t will indicate UCS2 on systems where it is from 16 to 31 bits (for example, Windows, where it is 16-bit), or UCS4 on systems where it is at least 32-bit (for example, Linux, where it is 32- bit), regardless of whether wchar_t strings use this encoding; on platforms that use different encodings for wchar_t strings, this is likely to cause problems if you are not careful.
For more information see the CPP link:
Note that codecvt header codecvt was only added in libstdc++ relatively recently. If you are using an older version of Clang or GCC, you may need to use libc++ if you want to use it.
Please note that versions of Visual Studio before 2015 do not actually support char16_t and char32_t ; if these types exist in previous versions, it will be like typedefs for unsigned short and unsigned int , respectively. Also note that older versions of Visual Studio sometimes have problems converting strings between UTF encodings and that Visual Studio 2015 has a crash that prevents codecvt from working codecvt with char16_t and char32_t , requiring instead to use integral types of the same size
source share