Why XML :: LibXML does not find nodes for this xpath request when using namespace

I am trying to select a node using an XPath query, and I do not understand why XML :: LibXML does not find the node when it has the xmlns attribute. Here's a script to demonstrate the problem:

#!/usr/bin/perl use XML::LibXML; # 1.70 on libxml2 from libxml2-dev 2.6.16-7sarge1 (don't ask) use XML::XPath; # 1.13 use strict; use warnings; use v5.8.4; # don't ask my ($xpath, $libxml, $use_namespace) = @ARGV; my $xml = sprintf(<<'END_XML', ($use_namespace ? 'xmlns="http://www.w3.org/2000/xmlns/"' : q{})); <?xml version="1.0" encoding="iso-8859-1"?> <RootElement> <MyContainer %s> <MyField> <Name>ID</Name> <Value>12345</Value> </MyField> <MyField> <Name>Name</Name> <Value>Ben</Value> </MyField> </MyContainer> </RootElement> END_XML my $xml_parser = $libxml ? XML::LibXML->load_xml(string => $xml, keep_blanks => 1) : XML::XPath->new(xml => $xml); my $nodecount = 0; foreach my $node ($xml_parser->findnodes($xpath)) { $nodecount ++; print "--NODE $nodecount--\n"; #would use say on newer perl print $node->toString($libxml && 1), "\n"; } unless ($nodecount) { print "NO NODES FOUND\n"; } 

This script allows you to choose between the XML :: LibXML parser and the XML :: XPath parser. It also allows you to define the xmlns attribute in the MyContainer element or leave it depending on the arguments passed.

I use the xpath expression "RootElement / MyContainer". When I run a query using XML :: LibXML parsing without a namespace, it detects a node without problems:

 benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml --NODE 1-- <MyContainer> <MyField> <Name>ID</Name> <Value>12345</Value> </MyField> <MyField> <Name>Name</Name> <Value>Ben</Value> </MyField> </MyContainer> 

However, when I run it with the namespace in place, it does not find the nodes:

 benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml use_namespace NO NODES FOUND 

Contrast this with the output when using the XMLL :: XPath parser:

 benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 # no namespace --NODE 1-- <MyContainer> <MyField> <Name>ID</Name> <Value>12345</Value> </MyField> <MyField> <Name>Name</Name> <Value>Ben</Value> </MyField> </MyContainer> benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 1 # with namespace --NODE 1-- <MyContainer xmlns="http://www.w3.org/2000/xmlns/"> <MyField> <Name>ID</Name> <Value>12345</Value> </MyField> <MyField> <Name>Name</Name> <Value>Ben</Value> </MyField> </MyContainer> 

Which of these parser implementations does this โ€œcorrectlyโ€? Why does XML :: LibXML treat it differently when I use the namespace? What can I do to get node when the namespace is in place?

+6
xml perl xpath libxml2
source share
3 answers

This is a FAQ. XPath considers any unsigned name in an expression to belong to "no namespace."

Then the expression:

 RootElement/MyContainer 

selects all MyContainer elements that are "without namespace" and are children of all RootElement elements that belong to "without a namespace" and are children of the context (current node). However, in the whole document there are no elements that belong to the "without namespace" - all elements belong to the default namespace.

This explains the result you get. XML :: LibXML is right.

The general solution is that the hosting language API allows you to bind a specific prefix to the namespace by "registering" the namespace. Then you can use an expression like:

 x:RootElement/x:MyContainer 

where x is the prefix with which the namespace was registered.

In very rare cases when the hosting language does not offer namespace registration , use the following expression:

 *[name()='RootElement']/*[name()='MyContainer'] 
+14
source share

@Dmitre is right. You need to take a look at XML :: LibXML :: XPathContext , which allows you to declare a namespace, and then you can use the XPath namespace declaration. I gave an example of using this some time ago in stackoverflow - see Why should I use XPathContext with Perl XML :: LibXML

+7
source share

Using XML :: LibXML 1.69.

This may be XML :: LibXML 1.69, but the weird part is that I can use regular XPath and findnodes (), and the code below prints the nodes.

 use strict; use XML::LibXML; my $xml = <<END_XML; <?xml version="1.0" encoding="iso-8859-1"?> <RootElement> <MyContainer xmlns="http://www.w3.org/2000/xmlns/"> <MyField> <Name>ID</Name> <Value>12345</Value> </MyField> <MyField> <Name>Name</Name> <Value>Ben</Value> </MyField> </MyContainer> </RootElement> END_XML my $parser = XML::LibXML->new(); $parser->recover_silently(1); my $doc = $parser->parse_string($xml); my $root = $doc->documentElement(); foreach my $node ($root->findnodes('MyContainer/MyField')) { print $node->toString(); } 

But if I changed the namespace to something other than "http://www.w3.org/2000/xmlns/", then to get the same nodes for printing you need to use XML :: LibXML :: XPathContext.

 use strict; use XML::LibXML; my $xml = <<END_XML; <?xml version="1.0" encoding="iso-8859-1"?> <RootElement> <MyContainer xmlns="http://something.org/2000/something/"> <MyField> <Name>ID</Name> <Value>12345</Value> </MyField> <MyField> <Name>Name</Name> <Value>Ben</Value> </MyField> </MyContainer> </RootElement> END_XML my $parser = XML::LibXML->new(); $parser->recover_silently(1); my $doc = $parser->parse_string($xml); my $root = $doc->documentElement(); my $xpc = XML::LibXML::XPathContext->new($root); $xpc->registerNs("x", "http://something.org/2000/something/"); foreach my $node ($xpc->findnodes('x:MyContainer/x:MyField')) { print $node->toString(); } 
+1
source share

All Articles