Trimming XML Elements of Iterparse

I have a large XML file (about 600 MB) that I am trying to parse using cElementTree with iterparse. The first time is an attempt.

I repeat the tags "product" and elem.clear()-ing after processing each product. In my parsing, I have a function parse_tripsthat uses a for loop to parse tags <trip>in tags <trips>(each product can have hundreds of them, each of which contains hundreds of lines).

for trip in trips:
    dump(trip)
    get_date(trip, product)
    set_price(trip, product)

However, when I dump(trips), I see that these tags are truncated / closed earlier, without any error. The parser seems to reach the maximum length for memory in the element, and then simply will no longer be held.

Raw xml:

<trip>
    <code>text</code>
    <name>text</name>
    <image>img.jpg</image>
    <date>2014-08-10</date>
    <pricing>

    </pricing>
    <itinerary>
        <code>1</code>
        <events>
            <event>
                eventInfo 1
            </event>
            <event>
                eventInfo 2
            </event>
            <event>
                eventInfo 3
            </event>
            <event>
                eventInfo 4
            </event>
            <event>
                eventInfo 5
            </event>
            <event>
                eventInfo 6
            </event>
            <event>
                eventInfo 7
            </event>
            <event>
                eventInfo 8
            </event>
        </events>
    </itinerary>
</trip>

, , 6 , , dump(trip) :

<trip>
    <code>text</code>
    <name>text</name>
    <image>img.jpg</image>
    <date>2014-08-10</date>
    <pricing></pricing>
    <itinerary>
        <code>1</code>
        <events>
            <event>
                eventInfo 1
            </event>
            <event>
                eventInfo 2
            </event>
            <event>
                eventInfo 3
            </event>
        </events>            
    </itinerary>
</trip>

. i, <trip>, , , for.

/ iterparse, ? iter , , <trips>?

+4

All Articles