Processing a large xml file with perl

I have an XML file about 200 MB in size, I want to extract the selected information line by line.

I wrote a script with perl using the XML :: LibXML module to parse the contents of a file and then encode the contents and extract the information line by line. This is inefficient because it reads the entire file in memory, but I like LibXML because I can use the XPath locations of the required information.

May I get suggestions on how to improve the performance of my code.

Through a search, I was introduced to XML :: SAX and XML :: LibXML :: SAX, but I can not find the documentation that explains the use, and they do not seem to contain any XPath addressing structure.

+5
source share
2 answers

You looked at the XML :: Twig module , which is much more efficient for large file processing, as it points to CPAN :

NAME

XML :: Twig - perl module for processing huge XML documents in tree mode.

SYNTAX

...

It allows a minimal resource (CPU and memory) by creating a tree only for parts of documents that need real processing, using twig_roots and twig_print_outside_roots parameters.

...

+15
source

I was lucky with XML::Twig, but it ended up with XML :: LibXML :: Reader , which is much faster ... You can also check XML::LibXML::Patternif you need to use XPath.

0

All Articles