Convert UTF-8 text to wchar_t

Question

Convert UTF-8 text to wchar_t

I know this question has been asked here several times, and I read some answers. But there are several suggested solutions, and I'm trying to figure out the best of them.

I am writing a C99 application that basically gets UTF-8 encoded XML text.

Part of his job is to copy and process this line (find substr, cat it, ex ..)

Since I would prefer not to use an external non-standard library right now, I am trying to implement it with wchar_t.

Currently im is using mbstowcs to convert it to wchar_t for easy manipulation, and for some input that I tried to use in different languages, it worked fine.

The thing is, I read some people, there were some problems with UTF-8 and mbstowcs, so I would like to hear about whether this use is allowed / acceptable.

Another option I came across is to use iconv with the WCHAR_T parameter. The fact is that I work on a platform (and not on a PC), with which its locale is very limited only for the ANSI C locale. How about this?

I also came across some C ++ library which is very popular. but it is limited to implement C99.

Also, I would compile this code on a different platform, the size of wchar_t is different (2 bytes versus 4 bytes on my machine). How can I overcome this? using containers with fixed char size? but then what manipulation functions should I use instead?

Happy to hear some thoughts. thanks.

+4

c utf-8 wchar-t

Johnny guitar 14 . '14 18:18

2

, , sizeof of wchar_t (2 4 ). ? char?

typedefs, :

#if defined(__STDC_UTF_16__)
   typedef _Char16_t CHAR16;
#elif defined(_WIN32)
   typedef wchar_t   CHAR16;
#else
   typedef uint16_t  CHAR16;
#endif

#if defined(__STDC_UTF_32__)
   typedef _Char32_t CHAR32;
#elif defined(__STDC_ISO_10646__)
   typedef wchar_t   CHAR32;
#else
   typedef uint32_t  CHAR32;
#endif

typedefs CHAR16 CHAR32, ++ 11, , wchar_t, , .

+1

dan04 14 . '14 20:23

McDowell · Accepted Answer · 2014-01-14T20:40:18+0000

C , char wchar_t , , , , . char UTF-8, mbstowcs .

C99:

, .
...
C90 , , C .

.

, UTF-8 char, API- wchar_t s.

-, wchar_t , , - , API WIN32, . , . wchar_t UTF-16LE Windows, wchar_t Unicode.

ICU - , .

Convert UTF-8 text to wchar_t

More articles: