PHP xpath request for XML with default namespace binding

I have one solution to the problem of the subject, but hack it and they are wondering if there is a better way to do this.

The following is an example of an XML file and a PHP CLI script that executes an xpath request specified as an argument. For this test case, on the command line:

./xpeg "//MainType[@ID=123]" 

The strangest thing is this line, without which my approach does not work:

 $result->loadXML($result->saveXML($result)); 

As far as I know, this just repeats the analysis of the modified XML, and it seems to me that this should not be necessary.

Is there a better way to execute xpath requests for this XML in PHP?




XML (note the default namespace binding):

 <?xml version="1.0" encoding="utf-8"?> <MyRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd" xmlns="http://www.example.com/data"> <MainType ID="192" comment="Bob site"> <Price>$0.20</Price> <TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl> <Validated>N</Validated> </MainType> <MainType ID="123" comment="Test site"> <Price>$99.95</Price> <TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl> <Validated>N</Validated> </MainType> <MainType ID="922" comment="Health Insurance"> <Price>$600.00</Price> <TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl> <Validated>N</Validated> </MainType> <MainType ID="389" comment="Used Cars"> <Price>$5000.00</Price> <TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl> <Validated>N</Validated> </MainType> </MyRoot> 



PHP CLI Script:

 #!/usr/bin/php-cli <?php $xml = file_get_contents("xpeg.xml"); $domdoc = new DOMDocument(); $domdoc->loadXML($xml); // remove the default namespace binding $e = $domdoc->documentElement; $e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,""); // hack hack, cough cough, hack hack $domdoc->loadXML($domdoc->saveXML($domdoc)); $xpath = new DOMXpath($domdoc); $str = trim($argv[1]); $result = $xpath->query($str); if ($result !== FALSE) { dump_dom_levels($result); } else { echo "error\n"; } // The following function isn't really part of the // question. It simply provides a concise summary of // the result. function dump_dom_levels($node, $level = 0) { $class = get_class($node); if ($class == "DOMNodeList") { echo "Level $level ($class): $node->length items\n"; foreach ($node as $child_node) { dump_dom_levels($child_node, $level+1); } } else { $nChildren = 0; foreach ($node->childNodes as $child_node) { if ($child_node->hasChildNodes()) { $nChildren++; } } if ($nChildren) { echo "Level $level ($class): $nChildren children\n"; } foreach ($node->childNodes as $child_node) { if ($child_node->hasChildNodes()) { dump_dom_levels($child_node, $level+1); } } } } ?> 
+4
xml php xpath domxpath
Jun 25 '11 at 2:23
source share
4 answers

The solution uses the namespace without getting rid of it.

 $result = new DOMDocument(); $result->loadXML($xml); $xpath = new DOMXpath($result); $xpath->registerNamespace("x", trim($argv[2])); $str = trim($argv[1]); $result = $xpath->query($str); 

And name it like this on the command line (note the x: in the XPath expression)

 ./xpeg "//x:MainType[@ID=123]" "http://www.example.com/data" 

You can make it more brilliant on

  • Define default namespaces by yourself (by viewing the namespace property of the document element
  • support for multiple namespaces on the command line and registering them to $xpath->query()
  • supporting arguments in the form xyz=http//namespace.uri/ for creating custom namespace prefixes

Bottom line: in XPath you cannot request //foo when you really mean //namespace:foo . They are fundamentally different and therefore choose different nodes. The fact that XML may have a default namespace defined (and therefore may refuse to use the explicit use of the namespace in the document) does not mean that you can refuse the use of the namespace in XPath.

+11
Jun 25 '11 at 3:14
source share

Just out of curiosity, what happens if you delete this line?

 $e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,""); 

It strikes me as most likely to cause the need for your hack. You basically delete the xmlns="http://www.example.com/data" , and then create the DOMDocument again. Have you just considered using string functions to remove this namespace?

 $pieces = explode('xmlns="', $xml); $xml = $pieces[0] . substr($pieces[1], strpos($pieces[1], '"') + 1); 

Then continue on your way? It may even be faster.

+1
Jun 25 2018-11-11T00:
source share

Given the current state of the XPath language, I believe that the best answer is provided by Tomalek: associate the prefix with the default namespace and the prefix of all tag names. This is the solution that I intend to use in my current application.

If this is impossible or practical, a better solution than my hack is to call a method that does the same thing as re-scanning (hopefully more efficiently): DOMDocument :: normalizeDocument () . The method behaves "as if you saved and then loaded the document, putting the document in" normal form ".

0
Jun 29 2018-11-11T00:
source share

Alternatively, you can use the xpath mask:

 //*[local-name(.) = 'MainType'][@ID='123'] 
0
Mar 03 '17 at 12:04 on
source share



All Articles