Python: parsing incomplete XML fragments

Question

Python: parsing incomplete XML fragments

I get XML fragments from the server. These pieces are not complete segments, but may look like this:

chunk1 = '<el a="1" b='
chunk2 = '"2"><sub c="'
chunk3 = '3">test</sub'
chunk4 = '></el><el d='
chunk5 = '"4" e="5"></'
chunk6 = 'el>'

How can I parse this thread so that whenever a single el element is completed, a function is called?

So far I have been using this approach (using ElementTree):

import xml.etree.ElementTree as ET

text = ""

def handle_message(msg):
    text += msg
    try:
        root = ET.fromstring("<root>" + text + "</root>")
        for el in list(root):
            handle_element(el)
        text = ""
        return True
    except ET.ParseError:
        return False

However, this approach does not really work, because it only calls handle_elementwhen it textcontains a randomly generated XML document, but cannot be guaranteed that it will ever be.

+4

python xml xml-parsing

basilikum Jul 25 '14 at 13:46

source share

2 answers

XML , XML- (, , ). , () / , XML, XML, . io.BytesIO io.StringIO , - , , , .

:

from io import StringIO

def __init__(self):
    self.buffer = StringIO()    # Buffer obj

def dataReceived(self, data):
    # this is data that is received from the server
    self.buffer.write( data )    # Usually want this in a callBack

def processBuffer(self):
    string = self.buffer.getvalue()
    ''' Do your parsing 
        Then once you have the complete xml
        do etree.fromstring( string ) or equivalant'''

, , - , , .

0

notorious.no 25 . '14 13:58

unutbu · Accepted Answer · 2014-07-25T17:57:22+0000

Perhaps you can use ET.iterparse to gradually analyze XML fragments:

import xml.etree.ElementTree as ET

chunks = iter([
    '<root>'
    '<el a="1" b=',
    '"2"><sub c="',
    '3">test</sub',
    '></el><el d=',
    '"4" e="5"></',
    'el>',
    '</root>'
    ])


class Source(object):
    def read(self, size):
        # Replace this with code that reads XML chunks from the server
        return next(chunks)

for event, elem in ET.iterparse(Source(), events=('end', )):
    if elem.tag == 'el':
        print(elem)
        # handle_element(elem)

<Element 'el' at 0xb744f6cc>
<Element 'el' at 0xb744f84c>

ET.iterparse io.BytesIO StringIO. , read. , , , ET.iterparse .

, ET.iterparse (, read(16384)). , , , , - , . , ( ).

Python: parsing incomplete XML fragments

More articles: