Perhaps you could use a template engine instead of generating / building xml yourself?
Genshi , for example, is xml-based and supports streaming output. A very simple example:
from genshi.template import MarkupTemplate tpl_xml = ''' <doc xmlns:py="http://genshi.edgewall.org/"> <p py:for="i in data">${i}</p> </doc> ''' tpl = MarkupTemplate(tpl_xml) stream = tpl.generate(data=xrange(10000000)) with open('output.xml', 'w') as f: stream.render(out=f)
This may take some time, but memory usage remains low.
The same example for the Mako template (not "native" xml), but much faster:
from mako.template import Template from mako.runtime import Context tpl_xml = ''' <doc> % for i in data: <p>${i}</p> % endfor </doc> ''' tpl = Template(tpl_xml) with open('output.xml', 'w') as f: ctx = Context(f, data=xrange(10000000)) tpl.render_context(ctx)
The last example worked on my laptop for about 20 seconds, creating (admittedly very simple) 151 MB xml file, no memory problems whatsoever. (according to the Windows task manager, it remained constant for about 10 MB)
Depending on your needs, this may be a friendlier and faster way to generate xml than using SAX, etc. Check the docs to find out what you can do with these engines (there are others, I just chose these two examples)
Steven
source share