Arguments for and against std :: wstring support exclusively in the cross-platform library

I am currently developing a cross-platform C ++ library that I intend to use in Unicode. I currently support compilation for std :: string or std :: wstring using typedefs and macros. The disadvantage of this approach is that it forces you to use macros like L("string") and heavily use patterns based on the character type.

What are the arguments for and against support only for std :: wstring?

Will using std :: wstring only impede the GNU / Linux database, where UTF-8 encoding is preferred?

+6
c ++ cross-platform unicode wstring
source share
5 answers

Many people would like to use unicode with UTF-8 (std :: string) rather than UCS-2 (std :: wstring). UTF-8 is the standard coding for many Linux distributions and databases, so support will not be a huge flaw. On Linux, every function call in your library with a string as an argument will require the user to convert the (native) UTF-8 string to std :: wstring.

In gcc / linux, each std :: wstring character will have 4 bytes, while it will have 2 bytes on Windows. This can lead to strange consequences when reading or writing files (and copying them from / to different platforms). I would prefer to recommend UTF-8 / std :: string for a cross-platform project.

+3
source share

What are the arguments for and against support only for std :: wstring?

The argument for using wide characters is that it can do all narrow characters and more.

The argument against him that I know:

  • wide characters require more space (which is hardly true, the Chinese do not, in principle, have more headaches over memory than the Americans)
  • the use of wide characters gives headaches to some Westerners who are used for all their characters to fit in 7 bits (and do not want to learn to pay a little attention so as not to mix the use of the type of character for actual characters and other purposes)

As for flexibility: I kept a library (several kLoC) that can deal with narrow and wide characters. Most of them were due to the fact that the type of the symbol is a parameter of the template, I do not remember any macros (except for UNICODE , that is). Not all of this was flexible, although there was some code that ended up requiring a char or wchar_t string. (It makes no sense to make internal key strings wide using wide characters.)
Users could decide whether they wanted only narrow character support (in this case, "string" was ok) or only wide character support (which required their use of L"string" ), or they also wanted to support both (which required something like T("string") ).

+2
source share

For:

Against:

  • You may need to interact with code that does not support i18n. But, like any good library writer, you just hide this mess behind a simple interface, right? Correctly?
+2
source share

I would say that using std::string or std::wstring does not matter.

In any case, none of them supports proper Unicode support.

If you need internationalization, you need proper Unicode support and you should start investigating libraries like ICU.

After that, it’s a matter of using encoding, and it depends on the platform you are on: wrap the OS-dependent objects behind the abstraction layer and convert it to the implementation layer when applicable.

Don't worry about the encoding used by the Unicode library that you are using (or creating?), This is a performance issue and should not affect the use of the library itself.

+2
source share

Inconvenience:

Since wstring is really UCS-2, not UTF-16. One day I will hit you in the lower leg. And it will hit hard.

0
source share

All Articles