Clear XML Twig inside helper handler

I am parsing large XML files (60GB +) with XML :: Twig and using it in an OO (Moose) script. I use the twig_handlers parameter to parse elements as soon as they are read in memory. However, I'm not sure how I can handle the Element and Twig.

Before I used Moose (and OO in general), my script looked like this (and worked):

 my $twig = XML::Twig->new( twig_handlers => { $outer_tag => \&_process_tree, } ); $twig->parsefile($input_file); sub _process_tree { my ($fulltwig, $twig) = @_; $twig->cut; $fulltwig->purge; # Do stuff with twig } 

And now I would do it like that.

 my $twig = XML::Twig->new( twig_handlers => { $self->outer_tag => sub { $self->_process_tree($_); } } ); $twig->parsefile($self->input_file); sub _process_tree { my ($self, $twig) = @_; $twig->cut; # Do stuff with twig # But now the 'full twig' is not purged } 

The thing is, now I see that I am missing the fulltwig . I realized that in the first version, different from OO, cleaning will help save memory: get rid of fulltwig as soon as I can. However, when using OO (and in order to rely on explicit sub{} inside the handler), I don’t see how I can clear the entire branch, because the documentation says that

$ _ is also set by the element, so it's easy to write inline handlers as

para => sub { $_->set_tag( 'p'); }

So, they are talking about the Element that you want to process, but not the most complete. So how can I remove this if it is not passed to the subroutine?

+7
xml perl xml-twig
source share
1 answer

The handler still gets the full branch, you just don't use it (use $ _ instead).

As it turned out, you can still call purge on the branch (which I usually call the "element" or elt in the documents): $_->purge will work as expected, clear the entire branch to the current element in $ _;

The cleanup method (IMHO) will be to actually get all the parameters and completely clear the entire branch:

 my $twig = XML::Twig->new( twig_handlers => { $self->outer_tag => sub { $self->_process_tree(@_); # pass _all_ of the arguments } } ); $twig->parsefile($self->input_file); sub _process_tree { my ($self, $full_twig, $twig) = @_; # now you see them! $twig->cut; # Do stuff with twig $full_twig->purge; # now you don't } 
+6
source share

All Articles