I did not find an explicit way to select all the nodes that exist between two anchors (a pair of tags <a></a>) in the HTML file.
The first anchor has the following format:
<a href="file://START..."></a>
Second anchor:
<a href="file://END..."></a>
I checked that both can be selected using start-with (note that I am using HTML Agility Pack):
HtmlNode n0 = html.DocumentNode.SelectSingleNode("//a[starts-with(@href,'file://START')]"));
HtmlNode n1 = html.DocumentNode.SelectSingleNode("//a[starts-with(@href,'file://END')]"));
With that in mind and with my XPath amateur skills, I wrote the following expression to get all tags between two anchors:
html.DocumentNode.SelectNodes("//*[not(following-sibling::a[starts-with(@href,'file://START0')]) and not (preceding-sibling::a[starts-with(@href,'file://END0')])]");
This seems to work, but selects the entire HTML document!
I need, for example, for the following HTML fragment:
<html>
...
<a href="file://START0"></a>
<p>First nodes</p>
<p>First nodes
<span>X</span>
</p>
<p>First nodes</p>
<a href="file://END0"></a>
...
</html>
remove both anchors, three P (including, of course, the internal SPAN).
How to do it?
I don't know if XPath 2.0 offers the best ways to do this.
* EDIT (special occasion!) *
, :
" X X ', X - <p><a href="file://..."></a></p>"
, :
<a href="file://START..."></a>
<a href="file://END..."></a>
:
<p>
<a href="file://START..."></a>
</p>
<p>
<a href="file://END..."></a>
</p>
.