Unicode Portability

I am currently doing an application that uses std::stringand charfor string operations, which is great for Linux , because Linux is agnostic to Unicode (or seems to be so; I really do not know, so please correct me if I tell you here stories). This current style naturally leads to these function / class declarations:

std::string doSomethingFunkyWith(const std::string& thisdata)
{
    /* .... */
}

However, if it thisdatacontains Unicode characters, it will not display correctly in windows because std::stringit cannot contain Unicode characters in Windows.

So, I came up with this concept:

namespace MyApplication {
#ifdef UNICODE
    typedef std::wstring  string_type;
    typedef wchar_t       char_type;
#else
    typedef std::string   string_type;
    typedef char          char_type;
#endif

    /* ... */
    string_type doSomethingFunkyWith(const string_type& thisdata)
    {
        /* ... */
    }
}

Is this a good idea to support Unicode on Windows?

toolchain gcc/clang Linux, Wine + MinGW Windows (- ), .

+5
4

- , , encóding Ãssues. std::wstring .

:

raw_input_data = read_raw_data()
input_encoding = "???" // What is your file or terminal encoding?

unicode_data = convert_to_unicode(raw_input_data, input_encoding)

// Do something with the unicode_data, store in some var, etc.

output_encoding = "???" // Is your terminal output encoding the same as your input?
raw_output_data = convert_from_unicode(unicode_data, output_encoding)

print_raw_data(raw_data)

Unicode input_encoding output_encoding. Linux UTF-8. Windows YMMV.

++ , , ICU, .

+4

, - , , . , (, , ..), .

Linux "" Unicode - Unicode, UTF-8, Unicode char. , Windows UTF-16, wchar_t 16- .

typedef , , . , , - .

+5

Linux Unicode, UTF-8. , - Unicode std::string UTF-8 UTF-16 Windows. UTF-8 UTF-16, UTF-8 (, *) , UTF-16 , , .

typedefs, , , . , UTF-8 / UTF-16, / .

* HTML, XML JSON, (, "<html> , <body> , ..), , .

+3

Linux Unicode , IO UTF-8, - 32 . Java , UTF-16.

Unicode, . OpenRTL http://code.google.com/p/openrtl, UTF-8, UTF-16 UTF- 32 windows, Linux, Osx Ios. Unicode - , , , , , 64 32- .

OpenRTL char8_t, char16_t char32_t ++, C ++. , Unicode , .

, OpenRTL, char_t OpenRTL. , UTF8, UTF16 UTF32 Linux, OpenRTL , io. , print_f.

By default, char_t displays a wide character type. So in windows it is 32 bits, and in Linux - 32 bits. But you can make it also make it 8 bits everywhere, for example. It also supports fast decoding of UTF inside loops using macros.

So, instead of ifdeffing between wchar_t and char you can build everything using char_t, and OpenRTL will take care of the rest.

+1
source

All Articles