How to smooth an XML file into an xpath expression set?

I have the following example XML file:

<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'> <article xmlns:ns1='http://predic8.com/material/1/'> <name xmlns:ns1='http://predic8.com/material/1/'>foo</name> <description xmlns:ns1='http://predic8.com/material/1/'>bar</description> <price xmlns:ns1='http://predic8.com/common/1/'> <amount xmlns:ns1='http://predic8.com/common/1/'>00.00</amount> <currency xmlns:ns1='http://predic8.com/common/1/'>USD</currency> </price> <id xmlns:ns1='http://predic8.com/material/1/'>1</id> </article> </ns1:create> 

What would be the best (most efficient) way to flatten this into a set of xpath expressions. Also note: I want to ignore any namespace and attribute information. (If necessary, this can also be done as a preprocessing step).

So, I want to get as output:

 /create/article/name /create/article/description /create/article/price/amount /create/article/price/currency /create/article/id 

Im implemented in Java.

EDIT: PS, I might also need this to work if there is no data in the node text, therefore, for example, this following should generate the same result as above:

 <ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'> <article xmlns:ns1='http://predic8.com/material/1/'> <name /> <description /> <price xmlns:ns1='http://predic8.com/common/1/'> <amount /> <currency xmlns:ns1='http://predic8.com/common/1/'></currency> </price> <id xmlns:ns1='http://predic8.com/material/1/'></id> </article> </ns1:create> 
+4
source share
2 answers

You can do this quite easily with XSLT. Looking at your examples, it seems that you only want XPath elements containing text. If not, let me know and I can update XSLT.

I created a new input example to show how it handles siblings of the same name. In this case, <article> .

XML input

 <ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'> <article xmlns:ns1='http://predic8.com/material/1/'> <name xmlns:ns1='http://predic8.com/material/1/'>foo</name> <description xmlns:ns1='http://predic8.com/material/1/'>bar</description> <price xmlns:ns1='http://predic8.com/common/1/'> <amount xmlns:ns1='http://predic8.com/common/1/'>00.00</amount> <currency xmlns:ns1='http://predic8.com/common/1/'>USD</currency> </price> <id xmlns:ns1='http://predic8.com/material/1/'>1</id> </article> <article xmlns:ns1='http://predic8.com/material/2/'> <name xmlns:ns1='http://predic8.com/material/2/'>some name</name> <description xmlns:ns1='http://predic8.com/material/2/'>some description</description> <price xmlns:ns1='http://predic8.com/common/2/'> <amount xmlns:ns1='http://predic8.com/common/2/'>00.01</amount> <currency xmlns:ns1='http://predic8.com/common/2/'>USD</currency> </price> <id xmlns:ns1='http://predic8.com/material/2/'>2</id> </article> </ns1:create> 

XSLT 1.0

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <xsl:template match="text()"/> <xsl:template match="*[text()]"> <xsl:call-template name="genPath"/> <xsl:apply-templates select="node()|@*"/> </xsl:template> <xsl:template name="genPath"> <xsl:param name="prevPath"/> <xsl:variable name="currPath" select="concat('/',local-name(),'[', count(preceding-sibling::*[name() = name(current())])+1,']',$prevPath)"/> <xsl:for-each select="parent::*"> <xsl:call-template name="genPath"> <xsl:with-param name="prevPath" select="$currPath"/> </xsl:call-template> </xsl:for-each> <xsl:if test="not(parent::*)"> <xsl:value-of select="$currPath"/> <xsl:text>&#xA;</xsl:text> </xsl:if> </xsl:template> </xsl:stylesheet> 

Exit

 /create[1]/article[1]/name[1] /create[1]/article[1]/description[1] /create[1]/article[1]/price[1]/amount[1] /create[1]/article[1]/price[1]/currency[1] /create[1]/article[1]/id[1] /create[1]/article[2]/name[1] /create[1]/article[2]/description[1] /create[1]/article[2]/price[1]/amount[1] /create[1]/article[2]/price[1]/currency[1] /create[1]/article[2]/id[1] 

UPDATE

To make XSLT work for all elements, simply remove the predicate [text()] from match="*[text()]" . This will print a path for each item. If you do not want the path displayed for elements containing other elements (for example, create, article and price) to add the predicate [not(*)] . Here is an updated example:

New XML Input

 <ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'> <article xmlns:ns1='http://predic8.com/material/1/'> <name /> <description /> <price xmlns:ns1='http://predic8.com/common/1/'> <amount /> <currency xmlns:ns1='http://predic8.com/common/1/'></currency> </price> <id xmlns:ns1='http://predic8.com/material/1/'></id> </article> <article xmlns:ns1='http://predic8.com/material/2/'> <name xmlns:ns1='http://predic8.com/material/2/'>some name</name> <description xmlns:ns1='http://predic8.com/material/2/'>some description</description> <price xmlns:ns1='http://predic8.com/common/2/'> <amount xmlns:ns1='http://predic8.com/common/2/'>00.01</amount> <currency xmlns:ns1='http://predic8.com/common/2/'>USD</currency> </price> <id xmlns:ns1='http://predic8.com/material/2/'>2</id> </article> </ns1:create> 

XSLT 1.0

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <xsl:template match="text()"/> <xsl:template match="*[not(*)]"> <xsl:call-template name="genPath"/> <xsl:apply-templates select="node()"/> </xsl:template> <xsl:template name="genPath"> <xsl:param name="prevPath"/> <xsl:variable name="currPath" select="concat('/',local-name(),'[', count(preceding-sibling::*[name() = name(current())])+1,']',$prevPath)"/> <xsl:for-each select="parent::*"> <xsl:call-template name="genPath"> <xsl:with-param name="prevPath" select="$currPath"/> </xsl:call-template> </xsl:for-each> <xsl:if test="not(parent::*)"> <xsl:value-of select="$currPath"/> <xsl:text>&#xA;</xsl:text> </xsl:if> </xsl:template> </xsl:stylesheet> 

Exit

 /create[1]/article[1]/name[1] /create[1]/article[1]/description[1] /create[1]/article[1]/price[1]/amount[1] /create[1]/article[1]/price[1]/currency[1] /create[1]/article[1]/id[1] /create[1]/article[2]/name[1] /create[1]/article[2]/description[1] /create[1]/article[2]/price[1]/amount[1] /create[1]/article[2]/price[1]/currency[1] /create[1]/article[2]/id[1] 

If you remove the predicate [not(*)] , this is what the output looks like (the path is output for each element):

 /create[1] /create[1]/article[1] /create[1]/article[1]/name[1] /create[1]/article[1]/description[1] /create[1]/article[1]/price[1] /create[1]/article[1]/price[1]/amount[1] /create[1]/article[1]/price[1]/currency[1] /create[1]/article[1]/id[1] /create[1]/article[2] /create[1]/article[2]/name[1] /create[1]/article[2]/description[1] /create[1]/article[2]/price[1] /create[1]/article[2]/price[1]/amount[1] /create[1]/article[2]/price[1]/currency[1] /create[1]/article[2]/id[1] 

Here's another version of XSLT, which is about 65% faster:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <xsl:template match="text()"/> <xsl:template match="*[not(*)]"> <xsl:for-each select="ancestor-or-self::*"> <xsl:value-of select="concat('/',local-name(),'[',count(preceding-sibling::*[local-name()=local-name(current())])+1,']')"/> </xsl:for-each> <xsl:text>&#xA;</xsl:text> <xsl:apply-templates select="node()"/> </xsl:template> </xsl:stylesheet> 
+2
source

My recommendation is to use the SAX parser. wiki entry for SAX , Xerces: SAX parser for java from Apache

In each launcher, add the element name to the end of the list. On each leaf item, delete the last list entry. When you run the content and want to display your xpath, it can be restored by iterating the list.

0
source

All Articles