I have a large XML file (about 600 MB) that I am trying to parse using cElementTree with iterparse. The first time is an attempt.
I repeat the tags "product" and elem.clear()-ing after processing each product. In my parsing, I have a function parse_tripsthat uses a for loop to parse tags <trip>in tags <trips>(each product can have hundreds of them, each of which contains hundreds of lines).
for trip in trips:
dump(trip)
get_date(trip, product)
set_price(trip, product)
However, when I dump(trips), I see that these tags are truncated / closed earlier, without any error. The parser seems to reach the maximum length for memory in the element, and then simply will no longer be held.
Raw xml:
<trip>
<code>text</code>
<name>text</name>
<image>img.jpg</image>
<date>2014-08-10</date>
<pricing>
</pricing>
<itinerary>
<code>1</code>
<events>
<event>
eventInfo 1
</event>
<event>
eventInfo 2
</event>
<event>
eventInfo 3
</event>
<event>
eventInfo 4
</event>
<event>
eventInfo 5
</event>
<event>
eventInfo 6
</event>
<event>
eventInfo 7
</event>
<event>
eventInfo 8
</event>
</events>
</itinerary>
</trip>
, , 6 , , dump(trip) :
<trip>
<code>text</code>
<name>text</name>
<image>img.jpg</image>
<date>2014-08-10</date>
<pricing></pricing>
<itinerary>
<code>1</code>
<events>
<event>
eventInfo 1
</event>
<event>
eventInfo 2
</event>
<event>
eventInfo 3
</event>
</events>
</itinerary>
</trip>
. i, <trip>, , , for.
/ iterparse, ? iter , , <trips>?