Special character character set

Question

Special character character set

Is iso-8859-1 the correct subset of utf-8?
What about iso-8859-n?
What about windows-1252?

If the answer does not match any of the above, what are the disjoint characters? I am testing some logic that detects encodings and wants to write tests to verify that the detection works correctly.

+5

utf-8 iso-8859-1 windows-1252 iso-8859-2

Sean jezewski Apr 05 '12 at 1:42

source share

2 answers

Unicode . Unicode : http://unicode.org/Public/MAPPINGS/.

0

deceze 05 . '12 2:11

dan04 · Accepted Answer · 2012-04-05T02:33:22+0000

Is iso-8859-1 the correct subset of utf-8?

The ISO-8859-1 personal report (the first 256 Unicode characters) is the correct subset for UTF-8 (each Unicode character).

However, the characters U + 0080 to U + 00FF are encoded differently in two encodings.

ISO-8859-1 80 FF.
UTF-8 , C2 80 - C3 BF.

iso-8859-n?

15 , 614 . "" ISO 8859, . .

, ISO-8859-2. , -2, -1, :

ĂăĄąĆćČčĎďĐđĘęĚěĹĺĽľŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝

Windows-1252?

Windows-1252 ISO-8859-1, , 0x80-0x9F . , -1252, ISO-8859-1:

ŒœŠšŸŽžƒˆ˜–—‘’‚""„†‡•…‰‹›€™

Special character character set

More articles: