Solution for the first task :
This is an XPath expression:
div[not(a)]
When evaluating the following XML document:
<t> <div> <a>a1</a> <b>b1</b> </div> <div> <b>b2</b> </div> <div> <a>a3</a> <b>b3</b> <c>c3</c> </div> </t>
selects the following three nodes ( a , div , a ):
<a>a1</a> <div> <b>b2</b> </div> <a>a3</a>
In your java array, any selected element not a should be considered as (or replaced by) null .
Here is one solution to the second problem :
Use these XPath expressions to select a elements from each group :
For the first group:
h1[2]) or count(.|h1[2]/preceding-sibling::a) ]
For the second group :
h1[3]) or count(.|h1[3]/preceding-sibling::a) ]
And for the third group :
h1[4]) or count(.|h1[4]/preceding-sibling::a) ]
If:
count(/*/h1 )
there is $cnt ,
generate $cnt such expressions (for i = 1 to $cnt ) and evaluate them all. The selected nodes for each of them either contain the element a or not. If the $k group (nodes selected from the evaluation of the $ kth โโexpression) contains a , use its string value to generate the $k element of the required array - otherwise, null generated for the $k element of the requested array.
Here is an XSLT check of the above XPath expressions :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:variable name="vGroup1" select= "/*/h1[1] /following-sibling::a [not(/*/h1[2]) or count(.|/*/h1[2]/preceding-sibling::a) = count(/*/h1[2]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup2" select= "/*/h1[2] /following-sibling::a [not(/*/h1[3]) or count(.|/*/h1[3]/preceding-sibling::a) = count(/*/h1[3]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup3" select= "/*/h1[3] /following-sibling::a [not(/*/h1[4]) or count(.|/*/h1[4]/preceding-sibling::a) = count(/*/h1[4]/preceding-sibling::a) ] "/> Group1: "<xsl:copy-of select="$vGroup1"/>" Group2: "<xsl:copy-of select="$vGroup2"/>" Group3: "<xsl:copy-of select="$vGroup3"/>" </xsl:template> </xsl:stylesheet>
When this conversion is applied to the following XML document (a complete and valid XML document was not provided by OP !!!):
<t> <h1>heading1</h1> <a>a1</a> <b>b1</b> <h1>heading2</h1> <b>b2</b> <h1>heading3</h1> <a>a3</a> <b>b3</b> <c>c3</c> </t>
three XPath expressions are evaluated, and the selected nodes for each of them are displayed :
Group1: "<a>a1</a>" Group2: "" Group3: "<a>a3</a>"
Explanation
We use the well-known Kaisei formula to intersect two nodes:
$ns1[count(. | $ns2) = count($ns2)]
The result of evaluating this expression contains exactly the nodes that apply to both node $ns1 and node $ns2 .
It remains to replace $ns1 and $ns2 expressions relevant to this problem.
Substitute $ns1 in:
/*/h1[1] /following-sibling::a
and substitute $ns2 in:
/*/h1[2] /preceding-sibling::a
In other words, the elements a that are between the first and second /*/h1 are the intersection of the elements a that follow the siblings /*/h1[1] and a elements that precede the siblings /*/h1[2] .
This expression is problematic only for elements a that follow the last of the elements /*/h1 . therefore, we add an additional predicate that checks for the absence of some next element /*/h1 and or with the following Boolean expressions.
Finally, as an example for the implementation of Java, the full XSLT transform is presented here, which does something similar - it creates a serialized array and can be mechanically translated into the corresponding Java solution :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="my:my"> <xsl:output method="text"/> <my:null>null</my:null> <my:Q>"</my:Q> <xsl:variable name="vNull" select="document('')my:Q"/> <xsl:template match="/"> <xsl:variable name="vGroup1" select= "h1[2]) or count(.|h1[2]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup2" select= "h1[3]) or count(.|h1[3]/preceding-sibling::a) ] "/> <xsl:variable name="vGroup3" select= "h1[4]) or count(.|h1[4]/preceding-sibling::a) ] "/> [<xsl:value-of select= "concat($vQ[$vGroup1/self::a[1]], $vGroup1/self::a[1], $vQ[$vGroup1/self::a[1]], $vNull[not($vGroup1/self::a[1])])"/> <xsl:text>,</xsl:text> <xsl:value-of select= "concat($vQ[$vGroup2/self::a[1]], $vGroup2/self::a[1], $vQ[$vGroup2/self::a[1]], $vNull[not($vGroup2/self::a[1])])"/> <xsl:text>,</xsl:text> <xsl:value-of select= "concat($vQ[$vGroup3/self::a[1]], $vGroup3/self::a[1], $vQ[$vGroup3/self::a[1]], $vNull[not($vGroup3/self::a[1])])"/>] </xsl:template> </xsl:stylesheet>
When this transformation is applied to the same XML document (see above), the desired, correct result is obtained :
["a1",null,"a3"]
Update2
Now OP added that he can use the XSLT solution. Here is one of them:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:my="my:my" exclude-result-prefixes="xsl"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:key name="kFollowing" match="a" use="generate-id(preceding-sibling::h1[1])"/> <my:null/> <xsl:variable name="vNull" select="document('')/*/my:null"/> <xsl:template match="/*"> <xsl:copy-of select= "h1/following-sibling::a[1] | h1[not(key('kFollowing', generate-id()))]"/> ============================================= <xsl:apply-templates select="h1"/> </xsl:template> <xsl:template match="h1"> <xsl:variable name="vAsInGroup" select= "key('kFollowing', generate-id())"/> <xsl:copy-of select="$vAsInGroup[1] | $vNull[not($vAsInGroup)]"/> </xsl:template> </xsl:stylesheet>
This conversion implements two different solutions. The difference is which element is used to represent "null". In the first case, it is an element h1 . This is not recommended, because any h1 already has its own meaning, which is different from the "null representation". The second solution uses the special element my:null to represent null.
When this conversion is applied to the same XML document as above :
<t> <h1>heading1</h1> <a>a1</a> <b>b1</b> <h1>heading2</h1> <b>b2</b> <h1>heading3</h1> <a>a3</a> <b>b3</b> <c>c3</c> </t>
each of two XPath expressions (containing XSLT key() links) is evaluated and the selected nodes are displayed (above and below "========" respectively):
<a>a1</a> <h1>heading2</h1> <a>a3</a> ============================================= <a>a1</a> <my:null xmlns:my="my:my" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/> <a>a3</a>
Performance Note :
Since keys are used, this solution will be significantly more efficient if more than one search is done - for example, when it is necessary to create the appropriate arrays for a , b and c .