The challenge is to parse a simple XML document and parse the content by line number.
The correct Python package looks like xml.sax. But how to use it?
After some digging in the documentation, I found:
- The interface
xmlreader.Locatorhas the information: getLineNumber(). - The interface
handler.ContentHandlerhas setDocumentHandler().
The first thought was to create Locator, pass it in ContentHandlerand read information from the locator during calls to its methods character(), etc.
BUT, xmlreader.Locatorit is only a skeletal interface and can only return -1 from any of its methods. So, as a bad user, WHAT should I do without writing integers Parserand Locatormy own?
I will answer my question now.
(Well, I would like, apart from an arbitrary, annoying rule that says I can't.)
I was not able to figure this out using existing documentation (or web search), and was forced to read the source code for xml.sax(in / usr / lib / python2.7 / xml / sax / on my system).
The xml.sax make_parser()default function creates the real one Parser, but what is it?
In the source code, it is detected that this ExpatParseris defined in expatreader.py. And ... he has his own Locator, a ExpatLocator. But access to this thing is missing. Between this and the decision there were many scratches on the head.
- write your own
ContentHandlerthat knows about Locator and uses it to determine line numbers - create
ExpatParserwithxml.sax.make_parser() - create
ExpatLocatorby passing it an instance ExpatParser. ContentHandler, ExpatLocatorContentHandler setContentHandler()parse() Parser.
:
import sys
import xml.sax
class EltHandler( xml.sax.handler.ContentHandler ):
def __init__( self, locator ):
xml.sax.handler.ContentHandler.__init__( self )
self.loc = locator
self.setDocumentLocator( self.loc )
def startElement( self, name, attrs ): pass
def endElement( self, name ): pass
def characters( self, data ):
lineNo = self.loc.getLineNumber()
print >> sys.stdout, "LINE", lineNo, data
def spit_lines( filepath ):
try:
parser = xml.sax.make_parser()
locator = xml.sax.expatreader.ExpatLocator( parser )
handler = EltHandler( locator )
parser.setContentHandler( handler )
parser.parse( filepath )
except IOError as e:
print >> sys.stderr, e
if len( sys.argv ) > 1:
filepath = sys.argv[1]
spit_lines( filepath )
else:
print >> sys.stderr, "Try providing a path to an XML file."
Martijn Pieters .
ContentHandler ,
, , ._locator
, Locator.
: Locator ( , ).
: , .
Martijn!