Some XML parsers allow incremental parsing of XML documents that the parser can start working on a document without requiring a complete download. The XMLTreeBuilder from the xml.etree.ElementTree module in the Python standard library is one such parser:
As you can see in the example below, you can pass data to the parser each time you read it from your input source. The corresponding hook methods in your handler class will be called when various XML "events" occur (beginning of the tag, reading data tags and the tag) that allow processing the data when loading the XML document:
from xml.etree.ElementTree import XMLTreeBuilder class MyHandler(object): def start(self, tag, attrib): # Called for each opening tag. print tag + " started" def end(self, tag): # Called for each closing tag. print tag + " ended" def data(self, data): # Called when data is read from a tag print data + " data read" def close(self): # Called when all data has been parsed. print "All data read" handler = MyHandler() parser = XMLTreeBuilder(target=handler) parser.feed(<sometag>) parser.feed(<sometag-child-tag>text) parser.feed(</sometag-child-tag>) parser.feed(</sometag>) parser.close()
In this example, the handler will receive five events and print:
sometag launched
sometag-child launched
"text" read data
ended with sometag-child
sometag is over
All data read
source share