Using perl XML :: LibXML to work with XML is so slow

The XML file is as follows:

<?xml version="1.0" encoding="UTF-8"?> <resource-data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="resource-data.xsd"> <class name="AP"> <attributes> <resourceId>00 11 B5 1B 6D 20</resourceId> <lastModifyTime>20130107091545</lastModifyTime> <dcTime>20130107093019</dcTime> <attribute name="NMS_ID" value="DNMS" /> <attribute name="IP_ADDR" value="10.11.141.111" /> <attribute name="LABEL_DEV" value="00 11 B5 1B 6D 20" /> </attributes> <attributes> <resourceId>00 11 B5 1B 6D 21</resourceId> <lastModifyTime>20130107091546</lastModifyTime> <dcTime>20130107093019</dcTime> <attribute name="NMS_ID" value="DNMS" /> <attribute name="IP_ADDR" value="10.11.141.112" /> <attribute name="LABEL_DEV" value="00 11 B5 1B 6D 21" /> </attributes> </class> </resource-data> 

And my code is:

 #!/usr/bin/perl use Encode; use XML::LibXML; use Data::Dumper; $parser = new XML::LibXML; $struct = $parser->parse_file("d:/AP_201301073100_1.xml"); my $file_data = "d:\\ap.txt"; open IN, ">$file_data"; $rootel = $struct->getDocumentElement(); $elname = $rootel->getName(); @kids = $rootel->getElementsByTagName('attributes'); foreach $child (@kids) { @atts = $child->getElementsByTagName('attribute'); foreach $at (@atts) { $va = $at->getAttribute('value'); print IN encode("gbk", "$va\t"); } print IN encode("gbk", "\n"); } close(IN); 

My question is, if the XML file is only 80 MB, then the program will be very fast, but when the XML file is much larger, the program can be very slow. Can someone help me speed this up, please?

+4
source share
5 answers

Another possibility is to use XML :: LibXML :: Reader . It works similarly to SAX, but uses the same libxml library as XML :: LibXML:

 #!/usr/bin/perl use warnings; use strict; use XML::LibXML::Reader; my $reader = XML::LibXML::Reader->new(location => '1.xml'); open my $OUT, '>:encoding(gbk)', '1.out'; while ($reader->read) { attr($reader) if 'attributes' eq $reader->name and XML_READER_TYPE_ELEMENT == $reader->nodeType; } sub attr { my $reader = shift; my @kids; ATTRIBUTE: while ($reader->read) { my $name = $reader->name; last ATTRIBUTE if 'attributes' eq $name; next ATTRIBUTE if XML_READER_TYPE_END_ELEMENT == $reader->nodeType; push @kids, $reader->getAttribute('value') if 'attribute' eq $name; } print {$OUT} join("\t", @kids), "\n"; } 
+4
source

Using XML::Twig , you can process each <attributes> element as it occurs during parsing, and then discard XML data that is no longer required.

This program seems to do what you need.

 use strict; use warnings; use XML::Twig; use Encode; use constant XML_FILE => 'S:/AP_201301073100_1.xml'; use constant OUT_FILE => 'D:/ap.txt'; open my $outfh, '>:encoding(gbk)', OUT_FILE or die $!; my $twig = XML::Twig->new(twig_handlers => {attributes => \&attributes}); $twig->parsefile('myxml.xml'); sub attributes { my ($twig, $atts) = @_; my @values = map $_->att('value'), $atts->children('attribute'); print $outfh join("\t", @values), "\n"; $twig->purge; } 

Output

 DNMS 10.11.141.111 00 11 B5 1B 6D 20 DNMS 10.11.141.112 00 11 B5 1B 6D 21 
+6
source

If you have XML files that are so large - 80 MB +, you cannot parse the entire file in memory - firstly, it is very slow, secondly, it will eventually run out of memory, and your program will crash .

I would suggest rewriting your code using XML :: Twig and using callbacks.

0
source

For large XML files, you should use a stream-based parser, such as XML::SAX , because DOM parsers build the entire XML structure in memory.

0
source

Another way: XML :: Rules :

 use strict; use warnings; use XML::Rules; use Data::Dumper; my @rules = ( attribute => [ attributes => sub { print "$_[1]{value}\n"; return } ], _default => undef, ); my $xr = XML::Rules->new( rules => \@rules ); my $data = $xr->parse($xml); 
0
source

All Articles