String Encoding Detection in C / C ++

Given a string as a pointer to an array of bytes (characters), how can I determine the encoding of a string in C / C ++ (I used visual studio 2008)? I did a search, but most of the samples are done in C #.

thank

+5
source share
2 answers

Assuming you know the length of the input array, you can make the following assumptions:

  • First check to see if the first few bytes correspond to Unicode Byte Icons (BOM) . If they do, everything is ready!
  • "\ 0" . , UTF-16 UTF-32. '\ 0, , , UTF-32.
  • - 0x80 0xff, , , ASCII UTF-7. - Unicode, UTF-8. , , . .
  • : ASCII, UTF-7, Base64, UTF-16 UTF-32, .
+6

, , , - , Notepad Redux .

Windows , IsTextUnicode MLang DetectInputCodePage, .

, ICU, .

+3

All Articles