Make a double slash in the XPath predicate the same way as in the path itself

I played with different XPath queries using XPather (works only with older versions of firefox) and notices the difference between the results of the following queries

Some results are shown here.

//div[descendant::table/descendant::td[4]] 

This list contains an empty list.

 //div[//table//td[4]] 

Are they different due to some rules or is it just the wrong behavior of a particular XPath interpreter implementation? (Looks like used with the FF engine, XPather is just a great simple GUI for queries)

+7
source share
2 answers

Since XPath 1.0 // is an abbreviation for /descendant-or-self::node()/ , so your first path is /descendant-or-self::node()/div[descendant::table/descendant::td[4]] , and the second is different from /descendant-or-self::node()/div[/descendant-or-self::node()/table/descendant-or-self::node()/td[4]] . So the main difference is that inside your first predicate you look down for descendants relative to the div element, and in the second predicate you look down for descendants from the root of the node / (also called the node document), You might want //div[.//table//td[4]] so that the second expression of the path approximates the first.

[edit] Here is an example:

 <html> <body> <div> <table> <tbody> <tr> <td>1</td> </tr> <tr> <td>2</td> </tr> <tr> <td>3</td> </tr> <tr> <td>4</td> </tr> </tbody> </table> </div> </body> </html> 

With this example, the path //div[descendant::table/descendant::td[4]] selects the div element since it has a child table element that has a fourth child td .

However, with //div[.//table//td[4]] we are looking for //div[./descendant-or-self::node()/table/descendant-or-self::node()/td[4]] which is short for //div[./descendant-or-self::node()/table/descendant-or-self::node()/child::td[4]] and no an element having a fourth child element td .

I hope this explains the difference if you use //div[.//table/descendant::td[4]] , then you should get the same result as in the original form.

+8
source

There is an important note in the W3C XPath document:

XML Path Language (XPath) Version 1.0
2 location paths
2.5 Shorthand Syntax

NOTE. The location path //para[1] does not mean the same as the location path /descendant::para[1] . The latter selects the first element of the descendant para ; the first selects all para descendant elements that are the first couple children of their parents.

This means that the double slash inside the path is not just a shortcut to /descendant-or-self::node()/ , but also a starting point for the next level of iteration of the XML tree, as a result of which the step pointer is repeated to the right of // for each child of the current context node.

So, the exact predicate value in this way

 //div[ descendant::table/descendant::td[4] ] 

is an:

  • build a sequence of all child nodes <table> for the current <div> ,
  • for each such <table> to build a sequence of all descendants of <td> elements and , combine them into one sequence,
  • filter this sequence for your fourth element.

Finally, the path returns all the <div> elements in the document that contain at least four data cells in all of their nested tables. And since there are tables in the document in which there are 4 cells or more (including cells in nested tables, of course), the whole expression selects its corresponding ancestors <div> .

On the other hand, the predicate in

 //div[ //table//td[4] ] 

means:

  • scan the entire document tree for <table> elements (more precisely, check the root of the node and each root child, if it has a child <table> ),
  • for each table found, scanning its subtree for elements having the fourth <td> subelement (i.e. test if the table or any of its descendants has at least four <td> child elements).

Note that predicate subexpression does not depend on the context of the node. This is a global path that allows a certain sequence of nodes (possibly empty), so the logical value of the predicate depends only on the structure of the document. If true, the entire path returns a sequence of all <div> elements in the document, otherwise an empty sequence.

Finally, the predicate would be true if there were an element in any table that has 4 (at least) data cells.
And as far as I can see that all <tr> lines contain two or three cells - there is no element with 4 or more <td> child elements, so the predicate subexpression returns in an empty sequence, the predicate is false and the entire path is filtered out. Result: nothing (empty sequence).

+4
source

All Articles