Python - lxml: performing a specific order of attributes

I have an XML script entry that outputs XML for a specific third-party tool.

I used the source XML as a template to make sure that I am building all the right elements, but the final XML does not look like the original.

I write attributes in the same order, but lxml writes them in its own order.

I'm not sure, but I suspect the 3rd part tool expects the attributes to appear in a specific order, and I would like to solve this problem so that I can see if its attribute order is due to which it fails, or something different.

Source Element:

<FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="text/x-test-signature"> 

My script source:

 sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", ID = str(db.ID), Name = db.name, PUID="fileSig/{}".format(str(db.ID)), Version = "", MIMEType = "") 

My resulting XML:

 <FileFormat MIMEType="" PUID="fileSig/19" Version="" Name="Printer Info File" ID="19"> 

Is there a way to limit their spelling?

+9
python xml lxml
source share
3 answers

The order and readability of attributes As commentators noted, the order of attributes does not make sense in XML, i.e. It does not change the value of an element:

 <tag attr1="val1" attr2="val2"/> <!-- means the same thing as: --> <tag attr2="val2" attr1="val1"/> 

In SQL, there is a similar characteristic where the order of the columns does not change the value of the table definition. The XML attributes and SQL columns are a set (not an ordered set), and therefore all that can be β€œofficially” said about one of them is whether there is an attribute or column in the set.

However, it definitely matters for the human readability that orders these things to appear in situations where constructions like this are authors and appear in the text (for example, the source code) and must be interpreted, careful ordering makes a lot of sense to me .

Typical parser behavior

Any XML parser that treats the order of attributes as meaningful would not conform to the XML standard. This does not mean that this cannot happen, but in my experience this, of course, is unusual. However, depending on the origin of the tool you are mentioning, this is an opportunity that may be worth checking out.

As far as I know, lxml does not have a mechanism for specifying order attributes in serial XML, and I would be surprised if this happened.

To test the behavior, I would really like to just write a text template to generate enough XML to validate it:

 id = 1 name = 'Development Signature' puid = 'dev/1' version = '1.0' mimetype = 'text/x-test-signature' template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" ' 'MIMEType="%s">') xml = template % (id, name, puid, version, mimetype) 
+5
source share

It seems that lxml serializes the attributes in the order in which you set them:

 >>> from lxml import etree as ET >>> x = ET.Element("x") >>> x.set('a', '1') >>> x.set('b', '2') >>> ET.tostring(x) '<xa="1" b="2"/>' >>> y= ET.Element("y") >>> y.set('b', '2') >>> y.set('a', '1') >>> ET.tostring(y) '<yb="2" a="1"/>' 

Note that when passing attributes using the ET.SubElement () constructor, Python creates a dictionary of keyword arguments and passes that dictionary to lxml. This loses any ordering that you had in the source file, because Python dictionaries are unordered (or rather, their order is determined by hash string values ​​that may vary from platform to platform or, in fact, from execution to execution).

+14
source share

OrderedDict Attributes

As in lxml 3.3.3 (possibly also in earlier versions), you can pass OrderedDict attributes to the lxml.etree.(Sub)Element constructor, and the order will be preserved when using lxml.etree.tostring(root) :

 sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", OrderedDict([("ID",str(db.ID)), ("Name",db.name), ("PUID","fileSig/{}".format(str(db.ID))), ("Version",""), ("MIMEType","")])) 

Note that the ElementTree API ( xml.etree.ElementTree ) does not preserve the order of the attributes, even if you provided an OrderedDict constructor xml.etree.ElementTree.(Sub)Element !

UPDATE: Also note that using the **extra parameter of the lxml.etree.(Sub)Element constructor to specify attributes does not preserve the order of the attributes:

 >>> from lxml.etree import Element, tostring >>> from collections import OrderedDict >>> root = Element("root", OrderedDict([("b","1"),("a","2")])) # attrib parameter >>> tostring(root) b'<root b="1" a="2"/>' # preserved >>> root = Element("root", b="1", a="2") # **extra parameter >>> tostring(root) b'<root a="2" b="1"/>' # not preserved 
+13
source share

All Articles