Setting encoding in XML files

Question

Setting encoding in XML files

What are the valid xml encoding strings? For example, what is the way to specify UTF-8:

encoding="utf8"
encoding="utf8"
etc.

Or Windows 1251:

encoding="windows-1251"
encoding="windows1251"
encoding="cp-1251"
and etc.

I am making a character decoder as well as an xml parser. Thus, I need to be able to set the encoding of my StreamReader based on the value from the encoding attribute.

Any ideas where I could find a list of the official encoding string?

The best I could find is this , but it seems to be a specific IE.

Thanks!

+4

xml encoding

Albus dumbledore Oct 19 '10 at 9:49

source share

3 answers

 <?xml version="1.0" encoding="utf-8"?>

should be good for utf-8.

+2

Shikiryu Oct 19 '10 at 9:54

source share

Use the locale -A command to see all encodings: http://dwbitechguru.blogspot.ca/2014/07/check-foreign-characters-support-on.html

Option A: To add an encoding using the following tags:

You can edit the encoding attribute in dtd using an XML spy.

Related links: http://dwbitechguru.blogspot.ca/2014/07/issue-xml-reader-error.html

0

dwbiguru 10 sept. '14 at 20:52

source share

sleske · Accepted Answer · 2010-10-19T09:55:18+0000

If all else fails, read the spec :-).

4.3.3 Character Encoding in Objects
Each external parsed object in an XML document can use a different encoding for its characters.
[...]
In the encoding declaration, the values "UTF-8", "UTF-16", ISO-10646-UCS-2 "and" ISO-10646-UCS-4 "MUST be used for various encodings and conversion of Unicode / ISO / IEC 10646, the values" ISO-8859-1 ", ISO-8859-2", ... "ISO-8859-n" (where n is the part number) SHOULD be used for parts of ISO 8859 and values of "ISO-2022-JP", Shift_JIS "and "EUC-JP" MUST be used for various coded forms of JIS X-0208-1997.
It is RECOMMENDED that registered character encodings (as encodings) with the Internet Managing assigned IANA-CHARSETS numbers , except for those listed, their registered names should be attributed to use; other encodings SHOULD use names beginning with the "x-" prefix.

Source: http://www.w3.org/TR/REC-xml/

So, UTF-8 is written as encoding="UTF-8" .

For other character sets not listed above, use the names specified in the IANA character set list .

The case with the letters in the character set name does not matter: “However, no distinction is made between the use of upper and lower case letters.” (IANA character set list). That way you can also write encoding="UTF-8" if you like it; -).

BTW: Are you really sure you want to write your own XML parser? It sounds suspicious of how to reinvent the wheel.

Setting encoding in XML files

More articles: