How can I add entity declarations via XML :: Twig programmatically?

Throughout my life, I can’t understand the XML :: Twig documentation for entity processing.

I have XML that I generate using HTML :: Tidy. The call is as follows:

my $tidy = HTML::Tidy->new({ 'indent' => 1, 'break-before-br' => 1, 'output-xhtml' => 0, 'output-xml' => 1, 'char-encoding' => 'raw', }); $str = "foo &nbsp; bar"; $xml = $tidy->clean("<xml>$str</xml>"); 

which produces:

 <html> <head> <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /> <title></title> </head> <body>foo &nbsp; bar</body> </html> 

XML :: Twig (understandable) barfs on &nbsp; . I want to do some conversions by executing them through XML :: Twig:

 my $twig = XML::Twig->new( twig_handlers => {... handlers ...} ); $twig->parse($xml); 

String line $twig->parse on &nbsp; , but I can’t figure out how to add the &nbsp; programmatically. I tried things like:

 my $entity = XML::Twig::Entity->new("nbsp", "&#160;"); $twig->entity_list->add($entity); $twig->parse($xml); 

... but without joy.

Please help =)

+4
source share
3 answers
 use strict; use XML::Twig; my $doctype = '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html [<!ENTITY nbsp "&#160;">]>'; my $xml = '<html><head><meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /><title></title></head><body>foo &nbsp; bar</body></html>'; my $xTwig = XML::Twig->new(); $xTwig->safe_parse($doctype . $xml) or die "Failure to parse XML : $@ "; print $xTwig->sprint(); 
+3
source

A dirty but effective trick in such a case would be to add a fake DTD declaration.

Then XML :: Parser, which does the parsing, will assume that the entity is defined in the DTD and will not depend on it.

To get rid of a fake DTD declaration, you can infer the root of the branch. If you need another ad, create it and replace the current one:

 #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $fake_dtd= '<!DOCTYPE head SYSTEM "foo"[]>'; # foo may not even exist my $xml='<html> <head> <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /> <title></title> </head> <body>foo &nbsp; bar</body> </html>'; XML::Twig->new->parse( $fake_dtd . $xml)->root->print; 
+5
source

Perhaps the best way, but the code below worked for me:

 my $filter = sub { my $text = shift; my $ascii = "\x{a0}"; # non breaking space my $nbsp = '&nbsp;'; $text =~ s/$ascii/$nbsp/; return $text; }; XML::Twig->new( output_filter => $filter ) ->parse_html( $xml ) ->print; 
+1
source

All Articles