XPath expression: choosing elements between href = "expr" tags

I did not find an explicit way to select all the nodes that exist between two anchors (a pair of tags <a></a>) in the HTML file.

The first anchor has the following format:

<a href="file://START..."></a>

Second anchor:

<a href="file://END..."></a>

I checked that both can be selected using start-with (note that I am using HTML Agility Pack):

HtmlNode n0 = html.DocumentNode.SelectSingleNode("//a[starts-with(@href,'file://START')]"));
HtmlNode n1 = html.DocumentNode.SelectSingleNode("//a[starts-with(@href,'file://END')]"));

With that in mind and with my XPath amateur skills, I wrote the following expression to get all tags between two anchors:

html.DocumentNode.SelectNodes("//*[not(following-sibling::a[starts-with(@href,'file://START0')]) and not (preceding-sibling::a[starts-with(@href,'file://END0')])]");

This seems to work, but selects the entire HTML document!

I need, for example, for the following HTML fragment:

<html>
...

<a href="file://START0"></a>
<p>First nodes</p>
<p>First nodes
    <span>X</span>
</p>
<p>First nodes</p>
<a href="file://END0"></a>

...
</html>

remove both anchors, three P (including, of course, the internal SPAN).

How to do it?

I don't know if XPath 2.0 offers the best ways to do this.

* EDIT (special occasion!) *

, :

" X X ', X - <p><a href="file://..."></a></p>"

, :

<a href="file://START..."></a>
<!-- xhtml to be extracted -->
<a href="file://END..."></a>

:

<p>
  <a href="file://START..."></a>
</p>
<!-- xhtml to be extracted -->

<p>
  <a href="file://END..."></a>
</p>

.

+5
2

XPath 1.0:

//a[starts-with(@href,'file://START')]/following-sibling::node()
     [count(.| //a[starts-with(@href,'file://END')]/preceding-sibling::node())
     =
      count(//a[starts-with(@href,'file://END')]/preceding-sibling::node())
     ]

XPath 2.0:

    //a[starts-with(@href,'file://START')]/following-sibling::node()
  intersect
    //a[starts-with(@href,'file://END')]/preceding-sibling::node()

XPath 2.0 XPath 2.0 intersect.

XPath 1.0 Kayessian ( @Michael Kay) node -sets:

$ns1[count(.|$ns2) = count($ns2)]

XSLT:

XSLT 1.0:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "    //a[starts-with(@href,'file://START')]/following-sibling::node()
         [count(.| //a[starts-with(@href,'file://END')]/preceding-sibling::node())
         =
          count(//a[starts-with(@href,'file://END')]/preceding-sibling::node())
         ]
  "/>
 </xsl:template>
</xsl:stylesheet>

XML-:

<html>...
    <a href="file://START0"></a>
    <p>First nodes</p>
    <p>First nodes    
        <span>X</span>
    </p>
    <p>First nodes</p>
    <a href="file://END0"></a>...
</html>

, :

<p>First nodes</p>
<p>First nodes    
        <span>X</span>
</p>
<p>First nodes</p>

XSLT 2.0:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  " //a[starts-with(@href,'file://START')]/following-sibling::node()
   intersect
    //a[starts-with(@href,'file://END')]/preceding-sibling::node()
  "/>
 </xsl:template>
</xsl:stylesheet>

, XML- (. ), .

+6

,

, , Kayessian ( XPath Visualizer;-)). node - :

node -set C

    "//p[.//a[starts-with(@href,'file://START')]]
         /following-sibling::node()"

p, a START.

node -set D

"./following-sibling::p[.//a[starts-with(@href,'file://END')]]
    /preceding-sibling::node())"

p, a END p


:

C ∩ D

    "//p[.//a[starts-with(@href,'file://START')]]
            /following-sibling::node()[
            count(.| ./following-sibling::p
                     [.//a[starts-with(@href,'file://END')]]
                       /preceding-sibling::node())
            =
            count(./following-sibling::p
                   [.//a[starts-with(@href,'file://END')]]
                     /preceding-sibling::node())
            ]"

, node -sets

(A ∩ B) ∪ (C ∩ D)

:

  • XPath |:
  • node - A e B @Dimitre'answer
  • node - C e D - , .
+2

All Articles