Special character character set

  • Is iso-8859-1 the correct subset of utf-8?
  • What about iso-8859-n?
  • What about windows-1252?

If the answer does not match any of the above, what are the disjoint characters? I am testing some logic that detects encodings and wants to write tests to verify that the detection works correctly.

+5
source share
2 answers

Is iso-8859-1 the correct subset of utf-8?

The ISO-8859-1 personal report (the first 256 Unicode characters) is the correct subset for UTF-8 (each Unicode character).

However, the characters U + 0080 to U + 00FF are encoded differently in two encodings.

  • ISO-8859-1 80 FF.
  • UTF-8 , C2 80 - C3 BF.

iso-8859-n?

15 , 614 . "" ISO 8859, . .

, ISO-8859-2. , -2, -1, :

Ă㥹ĆćČčĎďĐđĘęĚěĹĺĽľŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝

Windows-1252?

Windows-1252 ISO-8859-1, , 0x80-0x9F . , -1252, ISO-8859-1:

ŒœŠšŸŽžƒˆ˜–—‘’‚""„†‡•…‰‹›€™

+8
0

All Articles