How can I get an array of elements, including missing elements, using XPath in XSLT?

Given the following XML compliant HTML:

<div> <a>a1</a> <b>b1</b> </div> <div> <b>b2</b> </div> <div> <a>a3</a> <b>b3</b> <c>c3</c> </div> 

Executing //a will return:

 [a1,a3] 

The problem above: the data of the third column is now in second place, when A is not found, it is completely skipped.

as you can express xpath to get all A elements to be returned:

 [a1, null, a3] 

in the same case for //c , I wonder if it is possible to get

 [null, null, c3] 

UPDATE: Consider another scenario in which there are no common <div> parents.

 <h1>heading1</h1> <a>a1</a> <b>b1</b> <h1>heading2</h1> <b>b2</b> <h1>heading3</h1> <a>a3</a> <b>b3</b> <c>c3</c> 

UPDATE: I can now use XSLT.

+8
java html xml xpath xslt
source share
3 answers

There is no null value in XPath. This raises the issue of the mid-task, which also explains it: http://www.velocityreviews.com/forums/t686805-xpath-query-to-return-null-values.html

Actually, you have three options:

  • Do not use XPath at all.
  • Use this: //a | //div[not(a)] //a | //div[not(a)] , which will return the div element if there was no a , and let your Java code handle any div returned as the "no a " element. Depending on the context, this may even allow you to output something more useful if necessary, since you will have access to the entire contents of the div, for example, the element "no a found in the div (some identifier)".
  • Preprocess XML using XSLT, which inserts a elements into any div element that does not yet have a suitable default.

Your second case is a bit complicated, and to be honest, I would not recommend using XPath for it at all, but it can be done:

//a | //h1[not(following-sibling::a) or generate-id(.) != generate-id(following-sibling::a[1]/preceding-sibling::h1[1])]

This will match any elements a or any elements h1 where the next element a does not exist until the next element h1 or the end of the document. As Dimitri noted, this only works if you use it from XSLT, since generate-id is an XSLT function.

If you are not using it from XLST, you can use this rather far-fetched formula:

//a | //h1[not(following-sibling::a) or count(. | preceding-sibling::h1) != count(following-sibling::a[1]/preceding-sibling::h1)]

It works by matching elements of h1 , where the counter of itself and all previous elements of h1 do not coincide with the number of all elements of h1 preceding the next a . There may be a more efficient way to do this in XPath, but if it is going to get more far-fetched than that, I definitely recommend not using XPath at all.

+10
source share

Solution for the first task :

This is an XPath expression:

  /*/div/a | /*/div[not(a)] 

When evaluating the following XML document:

 <t> <div> <a>a1</a> <b>b1</b> </div> <div> <b>b2</b> </div> <div> <a>a3</a> <b>b3</b> <c>c3</c> </div> </t> 

selects the following three nodes ( a , div , a ):

 <a>a1</a> <div> <b>b2</b> </div> <a>a3</a> 

In your java array, any selected element not a should be considered as (or replaced by) null .


Here is one solution to the second problem :

Use these XPath expressions to select a elements from each group :

For the first group:

 /*/h1[1] /following-sibling::a [not(/*/h1[2]) or count(.|/*/h1[2]/preceding-sibling::a) = count(/*/h1[2]/preceding-sibling::a) ] 

For the second group :

 /*/h1[2] /following-sibling::a [not(/*/h1[3]) or count(.|/*/h1[3]/preceding-sibling::a) = count(/*/h1[3]/preceding-sibling::a) ] 

And for the third group :

 /*/h1[3] /following-sibling::a [not(/*/h1[4]) or count(.|/*/h1[4]/preceding-sibling::a) = count(/*/h1[4]/preceding-sibling::a) ] 

If:

count(/*/h1 )

there is $cnt ,

generate $cnt such expressions (for i = 1 to $cnt ) and evaluate them all. The selected nodes for each of them either contain the element a or not. If the $k group (nodes selected from the evaluation of the $ kth โ€‹โ€‹expression) contains a , use its string value to generate the $k element of the required array - otherwise, null generated for the $k element of the requested array.

Here is an XSLT check of the above XPath expressions :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:variable name="vGroup1" select= "/*/h1[1] /following-sibling::a [not(/*/h1[2]) or count(.|/*/h1[2]/preceding-sibling::a) = count(/*/h1[2]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup2" select= "/*/h1[2] /following-sibling::a [not(/*/h1[3]) or count(.|/*/h1[3]/preceding-sibling::a) = count(/*/h1[3]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup3" select= "/*/h1[3] /following-sibling::a [not(/*/h1[4]) or count(.|/*/h1[4]/preceding-sibling::a) = count(/*/h1[4]/preceding-sibling::a) ] "/> Group1: "<xsl:copy-of select="$vGroup1"/>" Group2: "<xsl:copy-of select="$vGroup2"/>" Group3: "<xsl:copy-of select="$vGroup3"/>" </xsl:template> </xsl:stylesheet> 

When this conversion is applied to the following XML document (a complete and valid XML document was not provided by OP !!!):

 <t> <h1>heading1</h1> <a>a1</a> <b>b1</b> <h1>heading2</h1> <b>b2</b> <h1>heading3</h1> <a>a3</a> <b>b3</b> <c>c3</c> </t> 

three XPath expressions are evaluated, and the selected nodes for each of them are displayed :

  Group1: "<a>a1</a>" Group2: "" Group3: "<a>a3</a>" 

Explanation

We use the well-known Kaisei formula to intersect two nodes:

 $ns1[count(. | $ns2) = count($ns2)] 

The result of evaluating this expression contains exactly the nodes that apply to both node $ns1 and node $ns2 .

It remains to replace $ns1 and $ns2 expressions relevant to this problem.

Substitute $ns1 in:

 /*/h1[1] /following-sibling::a 

and substitute $ns2 in:

 /*/h1[2] /preceding-sibling::a 

In other words, the elements a that are between the first and second /*/h1 are the intersection of the elements a that follow the siblings /*/h1[1] and a elements that precede the siblings /*/h1[2] .

This expression is problematic only for elements a that follow the last of the elements /*/h1 . therefore, we add an additional predicate that checks for the absence of some next element /*/h1 and or with the following Boolean expressions.

Finally, as an example for the implementation of Java, the full XSLT transform is presented here, which does something similar - it creates a serialized array and can be mechanically translated into the corresponding Java solution :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="my:my"> <xsl:output method="text"/> <my:null>null</my:null> <my:Q>"</my:Q> <xsl:variable name="vNull" select="document('')/*/my:null"/> <xsl:variable name="vQ" select="document('')/*/my:Q"/> <xsl:template match="/"> <xsl:variable name="vGroup1" select= "/*/h1[1] /following-sibling::a [not(/*/h1[2]) or count(.|/*/h1[2]/preceding-sibling::a) = count(/*/h1[2]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup2" select= "/*/h1[2] /following-sibling::a [not(/*/h1[3]) or count(.|/*/h1[3]/preceding-sibling::a) = count(/*/h1[3]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup3" select= "/*/h1[3] /following-sibling::a [not(/*/h1[4]) or count(.|/*/h1[4]/preceding-sibling::a) = count(/*/h1[4]/preceding-sibling::a) ] "/> [<xsl:value-of select= "concat($vQ[$vGroup1/self::a[1]], $vGroup1/self::a[1], $vQ[$vGroup1/self::a[1]], $vNull[not($vGroup1/self::a[1])])"/> <xsl:text>,</xsl:text> <xsl:value-of select= "concat($vQ[$vGroup2/self::a[1]], $vGroup2/self::a[1], $vQ[$vGroup2/self::a[1]], $vNull[not($vGroup2/self::a[1])])"/> <xsl:text>,</xsl:text> <xsl:value-of select= "concat($vQ[$vGroup3/self::a[1]], $vGroup3/self::a[1], $vQ[$vGroup3/self::a[1]], $vNull[not($vGroup3/self::a[1])])"/>] </xsl:template> </xsl:stylesheet> 

When this transformation is applied to the same XML document (see above), the desired, correct result is obtained :

  ["a1",null,"a3"] 

Update2

Now OP added that he can use the XSLT solution. Here is one of them:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="my:my" exclude-result-prefixes="xsl"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:key name="kFollowing" match="a" use="generate-id(preceding-sibling::h1[1])"/> <my:null/> <xsl:variable name="vNull" select="document('')/*/my:null"/> <xsl:template match="/*"> <xsl:copy-of select= "h1/following-sibling::a[1] | h1[not(key('kFollowing', generate-id()))]"/> ============================================= <xsl:apply-templates select="h1"/> </xsl:template> <xsl:template match="h1"> <xsl:variable name="vAsInGroup" select= "key('kFollowing', generate-id())"/> <xsl:copy-of select="$vAsInGroup[1] | $vNull[not($vAsInGroup)]"/> </xsl:template> </xsl:stylesheet> 

This conversion implements two different solutions. The difference is which element is used to represent "null". In the first case, it is an element h1 . This is not recommended, because any h1 already has its own meaning, which is different from the "null representation". The second solution uses the special element my:null to represent null.

When this conversion is applied to the same XML document as above :

 <t> <h1>heading1</h1> <a>a1</a> <b>b1</b> <h1>heading2</h1> <b>b2</b> <h1>heading3</h1> <a>a3</a> <b>b3</b> <c>c3</c> </t> 

each of two XPath expressions (containing XSLT key() links) is evaluated and the selected nodes are displayed (above and below "========" respectively):

 <a>a1</a> <h1>heading2</h1> <a>a3</a> ============================================= <a>a1</a> <my:null xmlns:my="my:my" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/> <a>a3</a> 

Performance Note :

Since keys are used, this solution will be significantly more efficient if more than one search is done - for example, when it is necessary to create the appropriate arrays for a , b and c .

+3
source share

I suggest you use the following, which can be rewritten in xsl: function, where the parent name of the node is parameterized (here: div).

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <root> <aList><xsl:copy-of select="$divIncludingNulls//a"/></aList> <bList><xsl:copy-of select="$divIncludingNulls//b"/></bList> <cList><xsl:copy-of select="$divIncludingNulls//c"/></cList> </root> </xsl:template> <xsl:variable name="divChild" select="distinct-values(//div/*/name())"/> <xsl:variable name="divIncludingNulls"> <xsl:for-each select="//div"> <xsl:variable name="divElt" select="."/> <div> <xsl:for-each select="$divChild"> <xsl:variable name="divEltvalue" select="$divElt/*[name()=current()]"/> <xsl:element name="{.}"> <xsl:choose> <xsl:when test="$divEltvalue"><xsl:value-of select="$divEltvalue"/></xsl:when> <xsl:otherwise>null</xsl:otherwise> </xsl:choose> </xsl:element> </xsl:for-each> </div> </xsl:for-each> </xsl:variable> </xsl:stylesheet> 

It applies to

 <?xml version="1.0" encoding="UTF-8"?> <root> <div> <a>a1</a> <b>b1</b> </div> <div> <b>b2</b> </div> <div> <a>a3</a> <b>b3</b> <c>c3</c> </div> </root> 

output

 <?xml version="1.0" encoding="UTF-8"?> <root> <aList> <a>a1</a> <a>null</a> <a>a3</a> </aList> <bList> <b>b1</b> <b>b2</b> <b>b3</b> </bList> <cList> <c>null</c> <c>null</c> <c>c3</c> </cList> </root> 
+3
source share

All Articles