I am currently creating a NodeList all document nodes (in document order) manually. XPath expression to get this NodeList -
//. | //@* | //namespace::*
My first attempt to manually execute the DOM and assemble the nodes ( NodeSet is a primitive implementation of NodeList delegating List ):
private static void walkRecursive(Node cur, NodeSet nodes) { nodes.add(cur); if (cur.hasAttributes()) { NamedNodeMap attrs = cur.getAttributes(); for (int i=0; i < attrs.getLength(); i++) { Node child = attrs.item(i); walkRecursive(child, nodes); } } int type = cur.getNodeType(); if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) { NodeList children = cur.getChildNodes(); if (children == null) return; for (int i=0; i < children.getLength(); i++) { Node child = children.item(i); walkRecursive(child, list); } } }
I would start the recursion by calling walkRecursive(doc, nodes) , where doc is org.w3c.Document and nodes a (but empty) NodeSet .
I tested this using this primitive XML document:
<?xml version="1.0"?> <myns:root xmlns:myns="http://www.my.ns/#"> <myns:element/> </myns:root>
If, for example, I canonicalize a manually created NodeSet and NodeList generated by the XPath expression originally mentioned and compare two bytes for a byte, then the result will be equal and seems to work fine.
But , if I repeat two NodeList and print out debugging information ( typeString just generates a string representation)
for (int i=0; i < nodes.getLength(); i++) { Node child = nodes.item(i); System.out.println("Type: " + typeString(child.getNodeType()) + " Name:" + child.getNodeName() + " Local name: " + child.getLocalName() + " NS: " + child.getNamespaceURI()); }
then I get this output for the generated XPath NodeList :
Type: DocumentNode Name:#document Local name: null NS: null Type: Element Name:myns:root Local name: root NS: http:
and this is for a manually created NodeList :
Type: DocumentNode Name:#document Local name: null NS: null Type: Element Name:myns:root Local name: root NS: http:
So, as you can see, in the first example, the NodeList additionally contains a Node for the XML namespace:
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Now my questions are:
a) If I interpret xml-names11 correctly , I do not need an xmlns: xml declaration:
The xml prefix is by definition associated with the namespace name http://www.w3.org/XML/1998/namespace . It MAY, but not be, be declared and MUST NOT be undeclared or bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name and MUST NOT be declared as the default namespace.
Am I right? (at least c) hints in that direction)
b) But then why does the XPath score add it anyway - shouldn't it just include what was in the first place, instead of automatically adding things?
c) This can cause problems with XML canonicalization , although it shouldn’t be that xml namespace declarations should be omitted during canonicalization. Does anyone know of (Java implementations) that do this wrong?
Edit:
Here is the code I used to compute the XPath expression containing the namespace "xml" node:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setNamespaceAware(true); dbf.setValidating(false); InputStream in = ...; try { Document doc = dbf.newDocumentBuilder().parse(in); XPathFactory fac = XPathFactory.newInstance(); XPath xp = fac.newXPath(); XPathExpression exp = xp.compile("//. | //@* | //namespace::*"); NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET); } finally { in.close(); }