Close all open xml tags

I have a file that changes its contents in a short time. But I would like to read it before it is ready. The problem is that this is an xml file (log). Therefore, when you read this, it is possible that not all tags are closed.

I would like to know if it is possible to close all open tags correctly, that there is no problem to show them in the browser (using xslt stylsheet). This should be done using the included python functions.

+4
source share
4 answers

Some XML parsers allow incremental parsing of XML documents that the parser can start working on a document without requiring a complete download. The XMLTreeBuilder from the xml.etree.ElementTree module in the Python standard library is one such parser:

As you can see in the example below, you can pass data to the parser each time you read it from your input source. The corresponding hook methods in your handler class will be called when various XML "events" occur (beginning of the tag, reading data tags and the tag) that allow processing the data when loading the XML document:

from xml.etree.ElementTree import XMLTreeBuilder class MyHandler(object): def start(self, tag, attrib): # Called for each opening tag. print tag + " started" def end(self, tag): # Called for each closing tag. print tag + " ended" def data(self, data): # Called when data is read from a tag print data + " data read" def close(self): # Called when all data has been parsed. print "All data read" handler = MyHandler() parser = XMLTreeBuilder(target=handler) parser.feed(<sometag>) parser.feed(<sometag-child-tag>text) parser.feed(</sometag-child-tag>) parser.feed(</sometag>) parser.close() 

In this example, the handler will receive five events and print:

sometag launched

sometag-child launched

"text" read data

ended with sometag-child

sometag is over

All data read

+5
source

If I understand your question correctly, you have a log file that is always added so you get something like:

 <root> <entry> ... </entry> <entry> ... </entry> ... <entry> ... </entry <!-- no closing root --> 

In this case, you DO NOT want to use the DOM parser because it is trying to read the full document and strangle the missing tag. Instead, the SAX or Pull parser will work because it reads the document as a data stream, not a complete tree. As Denis answered above, you can either close the missing tag at the end, or ignore any incomplete tags before recording it.

XML parsing on Wikipedia

+1
source

You can use any SAX parser, supplying data that is still available. Use a SAX handler that simply reconstructs the source XML, keeps the tag stack open and closes them in reverse order at the end.

0
source

You can use BeautifulStoneSoup (XML part of BeautifulSoup).

www.crummy.com/software/BeautifulSoup

This is not ideal, but it will get around the problem if you cannot fix the file output ...

This is basically a previously implemented version of what Denis said.

You can simply attach everything you need to the soup, and he will do his best to fix it.

0
source

All Articles