XPath, XML and Java Namespaces

I spent the last day trying to extract one XML node from the following document and was not able to understand the nuances of the XML namespaces to make it work.

The XML file should be published as a whole, so here is the part that interests me:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <XFDL xmlns="http://www.PureEdge.com/XFDL/6.5" xmlns:custom="http://www.PureEdge.com/XFDL/Custom" xmlns:designer="http://www.PureEdge.com/Designer/6.1" xmlns:pecs="http://www.PureEdge.com/PECustomerService" xmlns:xfdl="http://www.PureEdge.com/XFDL/6.5"> <globalpage sid="global"> <global sid="global"> <xmlmodel xmlns:xforms="http://www.w3.org/2003/xforms"> <instances> <xforms:instance id="metadata"> <form_metadata> <metadataver version="1.0"/> <metadataverdate> <date day="05" month="Jul" year="2005"/> </metadataverdate> <title> <documentnbr number="2062" prefix.army="DA" scope="army" suffix=""/> <longtitle>HAND RECEIPT/ANNEX NUMBER </longtitle> </title> 

The document continues and is fully formed to the end. I am trying to extract the attribute "number" from the tag "documentnbr" (three from the bottom).

The code I use for this is as follows:

 /*** * Locates the Document Number information in the file and returns the form number. * @return File self-declared number. * @throws InvalidFormException Thrown when XPath cannot find the "documentnbr" element in the file. */ public String getFormNumber() throws InvalidFormException { try{ XPath xPath = XPathFactory.newInstance().newXPath(); xPath.setNamespaceContext(new XFDLNamespaceContext()); Node result = (Node)xPath.evaluate(QUERY_FORM_NUMBER, doc, XPathConstants.NODE); if(result != null) { return result.getNodeValue(); } else { throw new InvalidFormException("Unable to identify form."); } } catch (XPathExpressionException err) { throw new InvalidFormException("Unable to find form number in file."); } } 

Where QUERY_FORM_NUMBER is an XPath expression, and XFDLNamespaceContext implements NamespaceContext and looks like this:

 public class XFDLNamespaceContext implements NamespaceContext { @Override public String getNamespaceURI(String prefix) { if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix"); else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) return "http://www.PureEdge.com/XFDL/6.5"; else if ("custom".equals(prefix)) return "http://www.PureEdge.com/XFDL/Custom"; else if ("designer".equals(prefix)) return "http://www.PureEdge.com/Designer/6.1"; else if ("pecs".equals(prefix)) return "http://www.PureEdge.com/PECustomerService"; else if ("xfdl".equals(prefix)) return "http://www.PureEdge.com/XFDL/6.5"; else if ("xforms".equals(prefix)) return "http://www.w3.org/2003/xforms"; else return XMLConstants.NULL_NS_URI; } @Override public String getPrefix(String arg0) { // TODO Auto-generated method stub return null; } @Override public Iterator getPrefixes(String arg0) { // TODO Auto-generated method stub return null; } } 

I have tried many different XPath queries, but I feel this should work:

 protected static final String QUERY_FORM_NUMBER = "/globalpage/global/xmlmodel/xforms:instances/instance" + "/form_metadata/title/documentnbr[number]"; 

Unfortunately, this does not work, and I constantly get zero returns.

I did quite a bit of reading here , here , and here , but nothing turned out to be light enough to help me get this work.

I am pretty sure that I will palm off when I find out about it, but I really am on the verge of what is missing.

Thank you for reading all this and in advance for your help.

-Andy

+6
java xpath xml-namespaces xfdl
source share
3 answers

Yeah, I tried to debug your + expression to make it work. You missed a few things. This XPath expression should do this:

 /XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number 
  • You need to enable the root element (XFDL in this case)
  • For some reason, I didn't have to use any namespaces in the expression. I do not know why. If so, then NamespaceContext.getNamespaceURI () is never called. If you replace instance with xforms:instance , then getNamespaceURI () is called once with xforms as the input argument, but the program throws an exception.
  • The syntax for @attr attribute @attr , not [attr] .

My complete code example:

 import java.io.File; import java.io.IOException; import java.util.Collections; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import javax.xml.XMLConstants; import javax.xml.namespace.NamespaceContext; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.xml.sax.SAXException; public class XPathNamespaceExample { static public class MyNamespaceContext implements NamespaceContext { final private Map<String, String> prefixMap; MyNamespaceContext(Map<String, String> prefixMap) { if (prefixMap != null) { this.prefixMap = Collections.unmodifiableMap(new HashMap<String, String>(prefixMap)); } else { this.prefixMap = Collections.emptyMap(); } } public String getPrefix(String namespaceURI) { // TODO Auto-generated method stub return null; } public Iterator getPrefixes(String namespaceURI) { // TODO Auto-generated method stub return null; } public String getNamespaceURI(String prefix) { if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix"); else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) return "http://www.PureEdge.com/XFDL/6.5"; else if ("custom".equals(prefix)) return "http://www.PureEdge.com/XFDL/Custom"; else if ("designer".equals(prefix)) return "http://www.PureEdge.com/Designer/6.1"; else if ("pecs".equals(prefix)) return "http://www.PureEdge.com/PECustomerService"; else if ("xfdl".equals(prefix)) return "http://www.PureEdge.com/XFDL/6.5"; else if ("xforms".equals(prefix)) return "http://www.w3.org/2003/xforms"; else return XMLConstants.NULL_NS_URI; } } protected static final String QUERY_FORM_NUMBER = "/XFDL/globalpage/global/xmlmodel/xforms:instances/instance" + "/form_metadata/title/documentnbr[number]"; public static void main(String[] args) { try { DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbfac.newDocumentBuilder(); Document doc = docBuilder.parse(new File(args[0])); System.out.println(extractNodeValue(doc, "/XFDL/globalpage/@sid")); System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/@id" )); System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number" )); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParserConfigurationException e) { e.printStackTrace(); } } private static String extractNodeValue(Document doc, String expression) { try{ XPath xPath = XPathFactory.newInstance().newXPath(); xPath.setNamespaceContext(new MyNamespaceContext(null)); Node result = (Node)xPath.evaluate(expression, doc, XPathConstants.NODE); if(result != null) { return result.getNodeValue(); } else { throw new RuntimeException("can't find expression"); } } catch (XPathExpressionException err) { throw new RuntimeException(err); } } } 
+5
source share

SAX Version (XPath Alternative):

 SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser(); final String[] number = new String[1]; DefaultHandler handler = new DefaultHandler() { @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (qName.equals("documentnbr")) number[0] = attributes.getValue("number"); } }; saxParser.parse("input.xml", handler); System.out.println(number[0]); 

I made it harder to use XPath with namespaces, as it should be (my opinion). Here is my (simple) code:

 XPath xpath = XPathFactory.newInstance().newXPath(); NamespaceContextMap contextMap = new NamespaceContextMap(); contextMap.put("custom", "http://www.PureEdge.com/XFDL/Custom"); contextMap.put("designer", "http://www.PureEdge.com/Designer/6.1"); contextMap.put("pecs", "http://www.PureEdge.com/PECustomerService"); contextMap.put("xfdl", "http://www.PureEdge.com/XFDL/6.5"); contextMap.put("xforms", "http://www.w3.org/2003/xforms"); contextMap.put("", "http://www.PureEdge.com/XFDL/6.5"); xpath.setNamespaceContext(contextMap); String expression = "//:documentnbr/@number"; InputSource inputSource = new InputSource("input.xml"); String number; number = (String) xpath.evaluate(expression, inputSource, XPathConstants.STRING); System.out.println(number); 

You can get the NamespaceContextMap class (not mine) from here (GPL license). There is also a 6376058 error.

+3
source share

Take a look at the XPathAPI library. This is an easier way to use XPath without interacting with the low-level Java API, especially when working with namespaces.

Code to get the number attribute:

 String num = XPathAPI.selectSingleNodeAsString(doc, '//documentnbr/@number'); 

Namespaces are automatically extracted from the root of the node ( doc in this case). If you need to explicitly define additional namespaces, you can use this:

 Map<String, String> nsMap = new HashMap<String, String>(); nsMap.put("xforms", "http://www.w3.org/2003/xforms"); String num = XPathAPI.selectSingleNodeAsString(doc, '//documentnbr/@number', nsMap); 

(Disclaimer: I am the author of the library.)

+2
source share

All Articles