Why does xslt output encoding = utf-8 not convert the iso-8859-1 character?

Why iso-8859-1 character is not converted to utf-8 into output file when setting output encoding to utf-8?

I have an iso-8859-1 encoded xml input file and the encoding is declared. I want to bring it to utf-8. I understand that setting the output encoding in the xslt file should control character conversion.

I understand what is wrong? If not, why does the following simple test case output the iso-8859-1 character in the declared utf-8 output file?

My input file is as follows:

<?xml version="1.0" encoding="ISO-8859-1"?> <data>ö</data> 

My conversion is as follows:

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:output encoding="UTF-8" /> <xsl:template match="/"> <result> <xsl:value-of select="." /> </result> </xsl:template> </xsl:stylesheet> 

Using saxon9he from the command line, my result is as follows:

 <?xml version="1.0" encoding="UTF-8"?> <result>ö</result> 

ö in my result file is 0xF6 according to BabelPad, which is an invalid utf-8 character. The ö transformation seems untouched.

Thanks for any help!

+4
source share
1 answer

I can see two possible explanations (I thought that maybe there are others).

(a) the final stage of serialization, that is, the conversion of characters to bytes, is not performed by the XSLT processor, but by some other software that does not have access to the stylesheet. This will happen, for example, if you run the conversion in a Java application that sends the result to Writer and not to OutputStream - Writer will convert characters to bytes using the default encoding of the platform, which is probably iso-8859-1.

(b) the octets that you see on your display are not the octets stored on disk, but some of their conversions. This can happen when you upload a file to the editor and then request a hexadecimal screen; in some cases, you get a hexadecimal display of the editor inside the document, rather than what is stored on disk.

+4
source

All Articles