How to quickly output XML output from Python

What is a quick way to write an XML file iteratively (i.e. without having the entire document in memory)? xml.sax.saxutils.XMLGeneratorIt works, but slowly, about 1 MB / s on an I7 machine. Here is an example.

+4
source share
2 answers

I understand that this question was asked some time ago, but in the meantime, an API was introduced lxmlthat looks promising in terms of solving the problem: http://lxml.de/api.html ; in particular, refer to the following section: "Incremental XML Generation".

I quickly tested it by placing a 10 megabyte file in the same way as in your benchmark, and it took a few seconds on my old laptop, which is by no means very scientific, but pretty much like the same start as yours generate_large_xml().

+1
source

Like Yuri V. Zaitsev , lxmlrealy provides an API for streaming XML documents

Here is a working example:

from lxml import etree

fname = "streamed.xml"
with open(fname, "w") as f, etree.xmlfile(f) as xf:
    attribs = {"tag": "bagggg", "text": "att text", "published": "now"}
    with xf.element("root", attribs):
        xf.write("root text\n")
        for i in xrange(10):
            rec = etree.Element("record", id=str(i))
            rec.text = "record text data"
            xf.write(rec)

The XML result looks like this (the content is reformatted from a single-line XML document):

<?xml version="1.0"?>
<root text="att text" tag="bagggg" published="now">root text
    <record id="0">record text data</record>
    <record id="1">record text data</record>
    <record id="2">record text data</record>
    <record id="3">record text data</record>
    <record id="4">record text data</record>
    <record id="5">record text data</record>
    <record id="6">record text data</record>
    <record id="7">record text data</record>
    <record id="8">record text data</record>
    <record id="9">record text data</record>
</root>
0
source

All Articles