Processing a large xml file with perl

Question

Processing a large xml file with perl

I have an XML file about 200 MB in size, I want to extract the selected information line by line.

I wrote a script with perl using the XML :: LibXML module to parse the contents of a file and then encode the contents and extract the information line by line. This is inefficient because it reads the entire file in memory, but I like LibXML because I can use the XPath locations of the required information.

May I get suggestions on how to improve the performance of my code.

Through a search, I was introduced to XML :: SAX and XML :: LibXML :: SAX, but I can not find the documentation that explains the use, and they do not seem to contain any XPath addressing structure.

+5

xml perl libxml2 sax

fir3x Feb 15 '11 at 16:30

source share

2 answers

I was lucky with XML::Twig, but it ended up with XML :: LibXML :: Reader , which is much faster ... You can also check XML::LibXML::Patternif you need to use XPath.

0

Onlyjob 25 . '14 2:28

Michael Goldshteyn · Accepted Answer · 2011-02-15T16:34:25+0000

You looked at the XML :: Twig module , which is much more efficient for large file processing, as it points to CPAN :

NAME

XML :: Twig - perl module for processing huge XML documents in tree mode.

SYNTAX

...
It allows a minimal resource (CPU and memory) by creating a tree only for parts of documents that need real processing, using twig_roots and twig_print_outside_roots parameters.
...

Processing a large xml file with perl

More articles: