Parse the XML file to get all the namespace information

I want to be able to get all the namespace information from a given XML file.

So, for example, if the input XML file looks something like this:

<ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"> <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>1</ns1:id> <description>bar</description> <name>foo</name> <ns1:price> <amount>00.00</amount> <currency>USD</currency> </ns1:price> <ns1:price> <amount>11.11</amount> <currency>AUD</currency> </ns1:price> </ns1:article> <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>2</ns1:id> <description>some name</description> <name>some description</name> <ns1:price> <amount>00.01</amount> <currency>USD</currency> </ns1:price> </ns1:article> </ns1:create> 

I would like to expect output that looks something like this (in this case, separated by a comma):

 create, ns1, http://predic8.com/wsdl/material/ArticleService/1/ article, ns1, http://predic8.com/material/1/ price, ns1, http://predic8.com/material/1/ id, ns1, http://predic8.com/material/1/ 

Important notes:

It is also important that we consider subnodes that are defined in a specific namespace, but whose definition can be defined at a higher node. For example, we would still like to get node ns1:id , where we need to track the parent node ns1:article to find that the namespace url is xmlns:ns1='http://predic8.com/material/1/

I am implementing in Java, so I would not mind either a Java solution, or even an XSLT-based solution might seem appropriate.

+3
source share
3 answers

Further development of the XPath expression proposed by Michael Kay (it seems like a simplification) also handles unsupported element names belonging to the default namespace:

 distinct-values(//*[namespace-uri()] /concat(local-name(), ', ', substring-before(name(), ':'), ', ', namespace-uri(), '&#xA;' ) ) 

When this XPath expression is evaluated in the following document (provided, but with an element added, which is in the default namespace):

 <ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"> <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>1</ns1:id> <description>bar</description> <name>foo</name> <ns1:price> <amount>00.00</amount> <currency>USD</currency> </ns1:price> <ns1:price> <amount>11.11</amount> <currency>AUD</currency> </ns1:price> </ns1:article> <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>2</ns1:id> <description>some name</description> <name>some description</name> <ns1:price> <amount>00.01</amount> <currency>USD</currency> </ns1:price> <quality xmlns="my:q">high</quality> </ns1:article> </ns1:create> 

required, the correct result is obtained :

  create, ns1, http://predic8.com/wsdl/material/ArticleService/1/ article, ns1, xmlns:ns1='http://predic8.com/material/1/ id, ns1, xmlns:ns1='http://predic8.com/material/1/ price, ns1, xmlns:ns1='http://predic8.com/material/1/ quality, , my:q 

Another minor improvement is also to create namespace data for attribute names:

 distinct-values(//(*|@*)[namespace-uri()] /concat(if(. intersect ../@*) then '@' else (), local-name(), ', ', substring-before(name(), ':'), ', ', namespace-uri(), '&#xA;' ) ) 

When an XPath expression is evaluated in the following XML document (previous (above) with the xml:lang attribute added on one of the article elements):

 <ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"> <ns1:article xml:lang="en-us" xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>1</ns1:id> <description>bar</description> <name>foo</name> <ns1:price> <amount>00.00</amount> <currency>USD</currency> </ns1:price> <ns1:price> <amount>11.11</amount> <currency>AUD</currency> </ns1:price> </ns1:article> <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>2</ns1:id> <description>some name</description> <name>some description</name> <ns1:price> <amount>00.01</amount> <currency>USD</currency> </ns1:price> <quality xmlns="my:q">high</quality> </ns1:article> </ns1:create> 

again the correct result is obtained:

  create, ns1, http://predic8.com/wsdl/material/ArticleService/1/ article, ns1, xmlns:ns1='http://predic8.com/material/1/ @lang, xml, http://www.w3.org/XML/1998/namespace id, ns1, xmlns:ns1='http://predic8.com/material/1/ price, ns1, xmlns:ns1='http://predic8.com/material/1/ quality, , my:q 
+3
source

I would use the built-in XMLStreamReader , which is the interface implemented by the XML streaming parser (get XMLInputFactory from it). Its getName method returns a QName, which should provide you with everything you need.

Something along the lines of:

 File file = new File("samples/sample11.xml"); XMLInputFactory inputFactory = XMLInputFactory.newInstance(); XMLStreamReader reader = inputFactory.createXMLStreamReader(new FileInputStream(file)); Set<String> namespaces = new HashSet<String>(); while (reader.hasNext()) { int evt = reader.next(); if (evt == XMLStreamConstants.START_ELEMENT) { QName qName = reader.getName(); if(qName != null){ if(qName.getPrefix() != null && qName.getPrefix().compareTo("")!=0) namespaces.add(String.format("%s, %s, %s", qName.getLocalPart(), qName.getPrefix(), qName.getNamespaceURI())); } } } for(String namespace : namespaces){ System.out.println(namespace); } 
+4
source

This can be done with a single XPath 2.0 expression:

 distinct-values(//*[name()!=local-name()]/ concat(local-name(), ', ', substring-before(name(), ':'), ', ', namespace-uri()) 
+3
source

All Articles