Python processes infinite XML

I am working on an application, and my task is to develop a sample Python interface for the application. An application can provide an XML-based document, I can get the document through the HTTP Get method, but the problem is that the XML-based document is infinite, which means there will be no final element. I know that a document must be processed by SAX, but how to deal with an endless problem? Any idea sample code?

+5
source share
7 answers

This is what I use to parse the endless xml stream that I get from the remote computer (in my case, I connect through the socket and use socket.makefile ('r') to create the file object)

19.12.2. IncrementalParser

parser = xml.sax.make_parser(['xml.sax.IncrementalParser'])
handler = FooHandler()
parser.setContentHandler(handler)

data = sockfile.readline()
while ( len(data) != 0 ):
    parser.feed(data)
    data = sockfilefile.readline()
+6

xmlstream jabberpy ( twisted):

xmlstream.py XML- . jabber.py.

xmlstream.py xml . " " ( xmlstreams) 'Node' . Node XML DOM XML- " " .

+3

close , XML, XML.

, API- Python SAX2, -, , , close-tag, .

, XML- :

<? xml version="1.0" ?>
<foo>
  <bar>...</bar>
  <bar>...</bar>
  <bar>...</bar>
  <bar>...</bar>
  ...

</foo>. SAX, bar, startElement(bar) endElement(bar). , , , .

- : bar , bar. SAX , , . , , sax-parser.

+2

, XML - XML, . -

<items>
  <item>
    <!-- content here -->
  </item>
  <item>
    <!-- content here -->
  </item>
  <item>
    <!-- content here -->
  </item>
</items>

SAX, , , , .

def process(item) :
  # App logic goes here

class ItemsHandler(xml.sax.handler.ContentHandler) :
  # Omitting __init__, startElement, and characters methods
  # to store data on a stack during processing

  def endElement(self, name) :
    if name == "item" :
      # create item from stored data on stack
      parsed_item = self.parse_item_from_stack()
      process(parsed_item)

, SAX , .

0

, ( ) , ? Python, </endtag> ?

0

Python , .

XML StAX . , SAX- , StAX . StAX XML ( SOAP), , , .

StAX Python, .

UPD: lxml ( tp libxml2) .

0

iterparse xml.etree.ElementTree ( cElementTree ) stdlib. ( lxml)

: http://effbot.org/zone/element-iterparse.htm#incremental-parsing

, . , . ( ). : .

stdlib; -)

0

All Articles