(Coded) String Processing in C ++ - Questions / Recommendations?

What are the best methods for handling strings in C ++? I am wondering how to handle the following cases:

  • Input and output of files and XML files, which can be written in different encodings. What is the recommended way to solve this problem and how to get the values? I think the XML node may contain UTF-16 text, and then I need to work with it somehow.

  • How to process strings char*. In the end, it can be unsigned or not, and I wonder how I determine which encoding they use (ANSI?), And how to convert to UTF-8? Are there any recommendations on this where the basic C / C ++ string guarantees are documented?

  • String algorithms for UTF-8 strings, etc. - calculation of length, parsing, etc. How is this done best?

  • What type of character is really portable? I found out that it wchar_tcan be anything from 8-32 bits wide, which makes it not the best choice if I want to be consistent on different platforms (especially when moving data between different platforms - this seems to be a problem, as described for example in EASTL, look at element # 13 )

I am currently using std::stringeverywhere, with a little utility for converting to UTF-16 when calling the Unicode-API, but I'm sure this is not the best way. Using something like Qt QStringor the ICU String class seems to be correct, but I wonder if there is an easier approach (i.e. if my strings charare encoded with ANSI, and the subset of ANSI that is used is UFT-8, then I can easily process the data as UTF-8 and provide converters from / to UTF-8, and I finished, as I can save it to std::string, unless there is a problem with this approach).

+3
source share
2 answers

UTF-16 ; Java/#/Python 3.0 . , wchar_t 16 32 , ; , API, wcrtomb(), wchar_t *, UTF-8 , , .

XML.

XML, . ? , XML node UTF-16, - .

, . . UTF-16 , ASCII . , XML , , UTF-16, UTF-16 . , UTF-16, : ? : § 4.3.3:

, (, HTTP MIME), , , XML , , , , , UTF-8. , ASCII UTF-8, ASCII .

, XML , ; , . , , UTF-16 .

Webography:

+3

UTF-8 .. - , .. ?

mbrlen C. , std::string , wstring .

, probaby UTF-16 UTF-8 - ( , , , ).

char *. , , , , (ANSI?), UTF-8? - , C/++ ?

, , , , 8- . C ASCII, . , , ISO-8859-x, .

UTF-8 , , . C, , , ( , ). C . mbrlen mbrtowc. Linux , LC_CTYPE, , , , . , API , .

char *. , ,

char, . , char , , ; , , char , a > 0 ( a char) undefined. ?

+1

All Articles