Problems encoding python etree.tostring

I am using python 2.6.2 xml.etree.cElementTree to create an XML document:

import xml.etree.cElementTree as etree elem = etree.Element('tag') elem.text = (u"Würth Elektronik Midcom").encode('utf-8') xml = etree.tostring(elem,encoding='UTF-8') 

At the end of the day, the xml looks like this:

 <?xml version='1.0' encoding='UTF-8'?> <tag>W&#195;&#188;rth Elektronik Midcom</tag> 

It seems like tostring ignored the encoding parameter and encoded "ü" into some other character encoding ("ü" is the correct utf-8 encoding, I'm sure).

Any advice regarding what I am doing wrong will be greatly appreciated.

+7
python tostring xml utf-8
source share
2 answers

You encode text twice. Try the following:

 import xml.etree.cElementTree as etree elem = etree.Element('tag') elem.text = u"Würth Elektronik Midcom" xml = etree.tostring(elem,encoding='UTF-8') 
+16
source share

etree.tostring(elem, encoding=str)

will return str but not binary in Python 3

You can also serialize a Unicode string without declaration by passing the unicode function as encoding (or str in Py3), or the name "unicode". This changes the return value from the string byte to the unencoded unicode string.

0
source share

All Articles