Convert UTF-16 to UTF-8 under Windows and Linux, in C

I was wondering if there is a recommended “cross” method for Windows and Linux to convert strings from UTF-16LE to UTF-8? or do you need to use different methods for each environment?

I managed to find some links to "iconv", but for somreason I can’t find some basic transformations, for example, converting wchar_t UTF-16 to UTF-8.

Anyone can recommend a method that would be a "cross", and if you know the links or the manual with the samples, you would really appreciate it.

Thanks Doori Bar

+20
c unicode utf-8 utf-16
May 19, '10 at 15:48
source share
7 answers

Thanks guys, I was able to solve the "cross" windows and Linux requirements:

  • Uploaded and installed: MinGW and MSYS
  • libiconv source package libiconv
  • Compiled by libiconv via MSYS .

What about that.

0
May 20 '10 at 12:36
source share

Change the encoding to UTF-8 using PowerShell:

 powershell -Command "Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt" 
+6
Mar 11 '15 at 8:46
source share

The ICU library is often used.

+5
May 19 '10 at 18:57
source share

If you do not want to use ICU,

+5
May 20 '10 at 2:08 a.m.
source share
 wchar_t *src = ...; int srclen = ...; char *dst = ...; int dstlen = ...; iconv_t conv = iconv_open("UTF-8", "UTF-16"); iconv(conv, (char*)&src, &srclen, &dst, &dstlen); iconv_close(conv); 
+3
May 20 '10 at 2:03 a.m.
source share

I also ran into this problem, I solve it using the upgrade locale library

 try { std::string utf8 = boost::locale::conv::utf_to_utf<char, short>( (short*)wcontent.c_str(), (short*)(wcontent.c_str() + wcontent.length())); content = boost::locale::conv::from_utf(utf8, "ISO-8859-1"); } catch (boost::locale::conv::conversion_error e) { std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl; break; } 

The function boost :: locale :: conv :: utf_to_utf tries to convert from a buffer that is encoded by UTF-16LE to UTF-8, The function boost :: locale :: conv :: from_utf tries to convert from a buffer that is encoded by UTF-8 to ANSI, make sure the encoding is correct (here I use the encoding for Latin-1, ISO-8859-1).

Another reminder: on Linux, std :: wstring is 4 bytes long, but on Windows, std :: wstring is 2 bytes long, so it’s best not to use std :: wstring to store the UTF-16LE buffer.

+3
Dec 04 '13 at 4:37
source share

There is also utfcpp , which is just a header library.

+2
Oct 12 '12 at 17:19
source share



All Articles