There is an important note in the W3C XPath document:
XML Path Language (XPath) Version 1.0
2 location paths
2.5 Shorthand Syntax
NOTE. The location path //para[1] does not mean the same as the location path /descendant::para[1] . The latter selects the first element of the descendant para ; the first selects all para descendant elements that are the first couple children of their parents.
This means that the double slash inside the path is not just a shortcut to /descendant-or-self::node()/ , but also a starting point for the next level of iteration of the XML tree, as a result of which the step pointer is repeated to the right of // for each child of the current context node.
So, the exact predicate value in this way
//div[ descendant::table/descendant::td[4] ]
is an:
- build a sequence of all child nodes
<table> for the current <div> , - for each such
<table> to build a sequence of all descendants of <td> elements and , combine them into one sequence, - filter this sequence for your fourth element.
Finally, the path returns all the <div> elements in the document that contain at least four data cells in all of their nested tables. And since there are tables in the document in which there are 4 cells or more (including cells in nested tables, of course), the whole expression selects its corresponding ancestors <div> .
On the other hand, the predicate in
//div[ //table//td[4] ]
means:
- scan the entire document tree for
<table> elements (more precisely, check the root of the node and each root child, if it has a child <table> ), - for each table found, scanning its subtree for elements having the fourth
<td> subelement (i.e. test if the table or any of its descendants has at least four <td> child elements).
Note that predicate subexpression does not depend on the context of the node. This is a global path that allows a certain sequence of nodes (possibly empty), so the logical value of the predicate depends only on the structure of the document. If true, the entire path returns a sequence of all <div> elements in the document, otherwise an empty sequence.
Finally, the predicate would be true if there were an element in any table that has 4 (at least) data cells.
And as far as I can see that all <tr> lines contain two or three cells - there is no element with 4 or more <td> child elements, so the predicate subexpression returns in an empty sequence, the predicate is false and the entire path is filtered out. Result: nothing (empty sequence).
Ciapan
source share