UTF-16 graphics codecvt

Continuation of this locale issue
And described in this question : what I really wanted to do was set the codecvt face to a locale that understands UTF-16 files.

I could write mine. But I am not a UTF expert, and therefore I am sure that I will get it almost correctly; but it will break at the most inconvenient time. Therefore, I was wondering if there are any resources (on the Internet) of pre-build codecvt (or other) faces that can be used in C ++ that are verified and verified by experts?

The reason is the default locale (on my MAC OS X 10.6 system) when reading a file, it simply converts 1 byte to 1 wchar_t without conversion. Thus, UTF-16 encoded files are converted to wstrings, which contain many null characters ('\ 0').

+3
source share
2 answers

I'm not sure if you meant "resources on the Internet" for free, but there is a Dinkumware Conversion Library that looks like it will fit your needs - provided that the library can be integrated into your compiler.

The codecvt types codecvt described in the section Code Conversions .

+2
source

With C ++ 11, there are additional standard codecvt specializations and types designed to convert between different UTF-x and UCSx character sequences; One of them can satisfy your needs.

In <locale> :

  • std::codecvt<char16_t, char, std::mbstate_t> : converts between UTF-16 and UTF-8.
  • std::codecvt<char32_t, char, std::mbstate_t> : converts between UTF-32 and UTF-8.

In <codecvt> :

  • std::codecvt_utf8_utf16<typename Elem> : converts between UTF-8 and UTF-16, where UTF-16 code points are stored as the specified Elem (note that if char32_t specified, only one code point will be saved per char32_t ).
    • It has two additional default template parameters ( unsigned long MaxCode = 0x10ffff and std::codecvt_mode Mode = (std::codecvt_mode)0 ) and inherits from std::codecvt<Elem, char, std::mbstate_t> .
  • std::codecvt_utf8<typename Elem> : converts between UTF-8 and UCS2 or UCS4, depending on Elem (UCS2 for char16_t , UCS4 for char32_t , depends on the platform for wchar_t ).
    • It has two additional default template parameters ( unsigned long MaxCode = 0x10ffff and std::codecvt_mode Mode = (std::codecvt_mode)0 ) and inherits from std::codecvt<Elem, char, std::mbstate_t> .
  • std::codecvt_utf16<typename Elem> : converts between UTF-16 and UCS2 or UCS4, depending on Elem (UCS2 for char16_t , UCS4 for char32_t , depends on the platform for wchar_t ).
    • It has two additional default template parameters ( unsigned long MaxCode = 0x10ffff and std::codecvt_mode Mode = (std::codecvt_mode)0 ) and inherits from std::codecvt<Elem, char, std::mbstate_t> .

codecvt_utf8 and codecvt_utf16 will convert between the specified UTF and UCS2 or UCS4, depending on the size of Elem . Thus, wchar_t will indicate UCS2 on systems where it is from 16 to 31 bits (for example, Windows, where it is 16-bit), or UCS4 on systems where it is at least 32-bit (for example, Linux, where it is 32- bit), regardless of whether wchar_t strings use this encoding; on platforms that use different encodings for wchar_t strings, this is likely to cause problems if you are not careful.

For more information see the CPP link:

Note that codecvt header codecvt was only added in libstdc++ relatively recently. If you are using an older version of Clang or GCC, you may need to use libc++ if you want to use it.
Please note that versions of Visual Studio before 2015 do not actually support char16_t and char32_t ; if these types exist in previous versions, it will be like typedefs for unsigned short and unsigned int , respectively. Also note that older versions of Visual Studio sometimes have problems converting strings between UTF encodings and that Visual Studio 2015 has a crash that prevents codecvt from working codecvt with char16_t and char32_t , requiring instead to use integral types of the same size

0
source

All Articles