Problems with unicode lxml descriptor syntax

I use lxml as follows to parse the exported XML file from another system:

xmldoc = open(filename) etree.parse(xmldoc) 

But I get:

lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46

Obviously this has problems with Unicode entity names, but how would I get around this? Via open () or parse ()?

Edit: I forgot to include my DTD in the same folder - now it is and has the following declaration:

 <!ENTITY eacute "&#233;"> 

and mentioned (and always has been) in xmldoc like this:

 <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE DScribeDatabase SYSTEM "foo.dtd"> 

But I'm still getting the same problem ... do I need to declare DTDs in Python as well?

+2
source share
1 answer

eacute not a predefined object in XML. To include a reference to the &eacute; in the XML file, it must have a <!DOCTYPE> declaration pointing to a DTD (for example, DTD XHTML 1.0) that defines the entity.

If XML uses &eacute; , but does not have <!DOCTYPE> , it is not correct, and the system exporting it must be fixed.

(It makes no sense to use an entity reference to represent Γ© in an XML file. A symbolic link &#233; is understood everywhere without entities if the file cannot just include raw UTF -8 Γ© for some reason.)

+6
source

All Articles