Setting encoding in XML files

What are the valid xml encoding strings? For example, what is the way to specify UTF-8:

  • encoding="utf8"
  • encoding="utf8"
  • etc.

Or Windows 1251:

  • encoding="windows-1251"
  • encoding="windows1251"
  • encoding="cp-1251"
  • and etc.

I am making a character decoder as well as an xml parser. Thus, I need to be able to set the encoding of my StreamReader based on the value from the encoding attribute.

Any ideas where I could find a list of the official encoding string?

The best I could find is this , but it seems to be a specific IE.

Thanks!

+4
source share
3 answers

If all else fails, read the spec :-).

4.3.3 Character Encoding in Objects

Each external parsed object in an XML document can use a different encoding for its characters.

[...]

In the encoding declaration, the values โ€‹โ€‹"UTF-8", "UTF-16", ISO-10646-UCS-2 "and" ISO-10646-UCS-4 "MUST be used for various encodings and conversion of Unicode / ISO / IEC 10646, the values" ISO-8859-1 ", ISO-8859-2", ... "ISO-8859-n" (where n is the part number) SHOULD be used for parts of ISO 8859 and values โ€‹โ€‹of "ISO-2022-JP", Shift_JIS "and "EUC-JP" MUST be used for various coded forms of JIS X-0208-1997.

It is RECOMMENDED that registered character encodings (as encodings) with the Internet Managing assigned IANA-CHARSETS numbers , except for those listed, their registered names should be attributed to use; other encodings SHOULD use names beginning with the "x-" prefix.

Source: http://www.w3.org/TR/REC-xml/

So, UTF-8 is written as encoding="UTF-8" .

For other character sets not listed above, use the names specified in the IANA character set list .

The case with the letters in the character set name does not matter: โ€œHowever, no distinction is made between the use of upper and lower case letters.โ€ (IANA character set list). That way you can also write encoding="UTF-8" if you like it; -).

BTW: Are you really sure you want to write your own XML parser? It sounds suspicious of how to reinvent the wheel.

+6
source
 <?xml version="1.0" encoding="utf-8"?> 

should be good for utf-8.

+2
source

Use the locale -A command to see all encodings: http://dwbitechguru.blogspot.ca/2014/07/check-foreign-characters-support-on.html

Option A: To add an encoding using the following tags:

You can edit the encoding attribute in dtd using an XML spy.

Related links: http://dwbitechguru.blogspot.ca/2014/07/issue-xml-reader-error.html

0
source

All Articles