C ++: supporting my project in unicode

Currently, my C ++ project contains about 16 thousand lines of code, and I admit that I did not think about Unicode support in the first place.

All I did was custom typedef for std::string as String and go into encoding.

I have never worked with Unicode in the programs I wrote.

  • How difficult is it to switch my project to unicode now? Is that even a good idea?

  • Is it possible to simply switch to std::wchar without any serious problems?

+7
source share
3 answers

Probably the most important part of creating a unicode application is to keep track of the encoding of your strings and make sure that your public interfaces are well specified and easy to use with the encodings you want to use.

Switching to a wider character (in C ++ wchar_t ) is not necessarily the right solution. In fact, I would say that this is usually not the easiest solution. Some applications may leave, indicating that all lines and interfaces use UTF-8 and do not need to be changed at all. std::string can very well be used for UTF-8 encoded strings.

However, if you need to interpret characters in a string or interface with interfaces other than UTF-8, you will have to work harder, but without knowing more about your application, it is impossible to recommend one better approach.

+7
source

There are some problems using std::wstring . If your application will store text in Unicode and it will work on different platforms, you may have problems. std::wstring relies on wchar_t , which is compiler dependent. In Microsoft Visual C ++, this type has a width of 16 bits and thus only supports UTF-16 encodings. The GNU C ++ compiler defines this type as 32 bits wide and thus only supports UTF-32 encodings. If you then save the text in a file from one system (say, Windows / VC ++), and then read the file from another system (Linux / GCC), you will have to prepare for this (in this case, convert from UTF-16 to UTF- 32).

+2
source

Is it possible to simply switch to [ std::wchar_t ] without any serious problems?

No, it's not that simple.

  • The encoding of the wchar_t string is platform dependent. Windows uses UTF-16. Linux typically uses UTF-32. (C ++ 0x will mitigate this difference by introducing the separate types char16_t and char32_t .)
  • If you need support for Unix-like systems, you do not have all the UTF-16 functions that Windows has, so you need to write your own _wfopen , etc.
  • Do you use any third-party libraries? Do they support wchar_t ?
  • Although wide characters are commonly used for representations in memory, on disk and on the Internet, formats are much more likely to be UTF-8 (or other char based encoding) than UTF-16/32. You will have to convert them.
  • You can't just search and replace char with wchar_t , because C ++ mixes "character" and "byte", and you need to determine which char are characters and char are bytes.
+1
source

All Articles