XPath in SimpleXML for default namespaces without prefixes

I have an XML document with a default namespace attached to it, e.g.

<foo xmlns="http://www.example.com/ns/1.0"> ... </foo> 

This is actually a complex XML document that conforms to a complex schema. My task is to parse some data. To help me, I have an XPath table. XPath is pretty deeply nested for example

 level1/level2/level3[@foo="bar"]/level4[@foo="bar"]/level5/level6[2] 

The person who generates XPath is an expert in the circuit, so I assume that I cannot simplify it or use shortcuts to traverse the object.

I use SimpleXML to parse everything. My problem is how the default namespace is handled.

Since there is a default namespace in the root element, I cannot just do

 $xml = simplexml_load_file($somepath); $node = $xml->xpath('level1/level2/level3[@foo="bar"]/level4[@foo="bar"]/level5/level6[2]'); 

I need to register a namespace , assign it a prefix, and then use the prefix in my XPath, for example

 $xml = simplexml_load_file($somepath); $xml->registerXPathNamespace('myns', 'http://www.example.com/ns/1.0'); $node = $xml->xpath('myns:level1/myns:level2/myns:level3[@foo="bar"]/myns:level4[@foo="bar"]/myns:level5/myns:level6[2]'); 

Adding prefixes is not manageable in the long run.

Is there a proper way to handle default namespaces without using prefixes with XPath?

Using an empty prefix does not work ( $xml->registerXPathNamespace('', 'http://www.example.com/ns/1.0'); ). I can cross out the default namespace for example

 $xml = file_get_contents($somepath); $xml = str_replace('xmlns="http://www.example.com/ns/1.0"', '', $xml); $xml = simplexml_load_string($xml); 

but this circumvents the problem.

+8
xml php namespaces xpath simplexml
source share
3 answers

From a small amount of reading on the Internet, this is not limited to any particular PHP or other library, but for XPath itself - at least in XPath version 1.0

XPath 1.0 does not include the concept of a default namespace, so no matter how the names of the elements are displayed in the XML source, if they have an associated namespace, the selectors for them must be prefixed in the base XPath selectors of the ns:name form. Note that ns is a prefix defined in the XPath processor and not a document being processed, therefore it is not related to how xmlns attributes are used in the XML view.

See this page of “general XSLT errors” for a closely related XSLT 1.0:

To access named elements in XPath, you must define a prefix for your namespace. [...] Unfortunately, XSLT version 1.0 does not have a concept similar to the default namespace; therefore, you must repeat namespace prefixes over and over again.

Consistent with an answer to a similar question , XPath 2.0 does include the notion of “default namespace,” and the XSLT link above also refers to this in the context of XSLT 2.0.

Unfortunately, all the built-in XML extensions in PHP are built on top of libxml2 and libxslt , which only support version 1.0 of XPath and XSLT.

Therefore, in addition to preprocessing the document so as not to use namespaces, the only option would be to find an XPath 2.0 processor that you could connect to PHP.

(As an aside, it's worth noting that if you have unrelated attributes in your XML document, they are not technically in the default namespace, and not in the namespace at all, see XML Namespaces and Unprefixed Attributes for a discussion of this weird space specification names.)

+9
source share

Is there a proper way to handle default namespaces unnecessarily using prefixes with XPath?

Not. The proper way to handle any namespace is to associate some value (prefix) with that namespace so that it can be explicitly selected in the XPath expression. The default namespace is no different.

Think of it this way: an element in some namespace and another element with the same name in some other namespace (or no namespace at all) are different elements. They can mean (i.e. represent) different things. It's all. You need to specify the XPath you want to select. Without this, XPath does not know what you are asking.

Adding prefixes is not manageable in the long run.

I really don't understand why. No matter what the XPath expression creates, the specific XPath expression must be specified (or is it a broken tool).

You might be thinking, “ why can't I just ignore the namespace and get all the elements matching that name? ” There are really hacker ways to do this (for example, based on the XSLT response), but they are broken by design. An element in XML is identified by a combination of its namespace and local name, just as your home can be identified with a street number (local name) in any city and state (namespace). If I tell you that I live on main street 422, then you still don't know where I live until I tell you which city and state.

You may still be thinking: " enough with stupid counterparts, I really really want to do this ." You can select elements with a given name in all namespaces by matching only the local part of the element name, for example:

 *[local-name()='level1']/*[local-name()='level2'] /*[local-name()='level3' and @foo="bar"]/*[local-name()='level4' and @foo="bar"]/*[local-name()='level5']/*[local-name()='level6'][2]'); 

Note that this does not limit the default namespace. It completely ignores namespaces. This is ugly and I do not recommend it, but sometimes you just want to ignore what is best and do something.

By the way, this is not a PHP error. This is what the XPath specification requires. You must specify a prefix to select node in the namespace. If PHP allowed you to do this in some other way, then, whatever they called it, it would no longer be XPath (according to the specification).

+2
source share

To avoid hacks such as str_replace that you have (and I would recommend avoiding this), you can run XML files through XSLT to exclude the namespace:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:myns="http://www.example.com/ns/1.0"> <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()" /> </xsl:copy> </xsl:template> <xsl:template match="myns:*"> <xsl:element name="{local-name()}"> <xsl:apply-templates select="@* | node()" /> </xsl:element> </xsl:template> </xsl:stylesheet> 

When running on any of these inputs:

 <foo xmlns="http://www.example.com/ns/1.0"> <a> <child attr="5"></child> </a> </foo> <ex:foo xmlns:ex="http://www.example.com/ns/1.0"> <ex:a> <ex:child attr="5"></ex:child> </ex:a> </ex:foo> 

The conclusion is the same:

 <foo> <a> <child attr="5" /> </a> </foo> 

This will allow you to use your XPaths without a prefix as a result.

0
source share

All Articles