If all else fails, read the spec :-).
4.3.3 Character Encoding in Objects
Each external parsed object in an XML document can use a different encoding for its characters.
[...]
In the encoding declaration, the values โโ"UTF-8", "UTF-16", ISO-10646-UCS-2 "and" ISO-10646-UCS-4 "MUST be used for various encodings and conversion of Unicode / ISO / IEC 10646, the values" ISO-8859-1 ", ISO-8859-2", ... "ISO-8859-n" (where n is the part number) SHOULD be used for parts of ISO 8859 and values โโof "ISO-2022-JP", Shift_JIS "and "EUC-JP" MUST be used for various coded forms of JIS X-0208-1997.
It is RECOMMENDED that registered character encodings (as encodings) with the Internet Managing assigned IANA-CHARSETS numbers , except for those listed, their registered names should be attributed to use; other encodings SHOULD use names beginning with the "x-" prefix.
Source: http://www.w3.org/TR/REC-xml/
So, UTF-8 is written as encoding="UTF-8" .
For other character sets not listed above, use the names specified in the IANA character set list .
The case with the letters in the character set name does not matter: โHowever, no distinction is made between the use of upper and lower case letters.โ (IANA character set list). That way you can also write encoding="UTF-8" if you like it; -).
BTW: Are you really sure you want to write your own XML parser? It sounds suspicious of how to reinvent the wheel.
source share