XPath to find a cell with specific text parsing HTML tables

Question

XPath to find a cell with specific text parsing HTML tables

Hope someone out there can quickly point me in the right direction with my XPath difficulties.

Currently, I have come to the point that I am defining the correct table that I need in my source HTML file, but then I only need to process the lines with the text "Head" somewhere in the DOM.

My last attempt was to do this:

// get the correct table HtmlTable table = page.getFirstByXPath("//table[2]"); // now the failing bit.... def rows = table.getByXPath("*/td[contains(text(),'Chapter')]")

I thought that the xpath view above would represent, get all the elements that have the next child element 'td', which somewhere in its dom contains the text “Chapter”

An example of a suitable line from my source:

 <tr valign="top"> <td nowrap="" align="Right"> <font face="Verdana"> <a href="index.cfm?a=1">Chapter 1</a> </font> </td> <td class="ChapterT"> <font face="Verdana">DEFINITIONS</font> </td> <td>&nbsp;</td> </tr>

Any help / pointers are greatly appreciated.

Thanks,

+8

xml xpath groovy htmlunit

Dave Mar 10 '12 at 3:48

source share

3 answers

You want all td under your current node - not - everything in the document , as the currently accepted answer selects .

Using

 .//td[.//text()[contains(., 'Chapter')]]

All td descendants of the current node are highlighted here, which have the name td , which have at least one text node descendant, the string value of which contains the string "Chapter" .

If you know in advance that any td in this table has only one node text, this can be simplified to :

 .//td[contains(., 'Chapter')]

+7

Dimitre novatchev Mar 10 '12 at 15:42

source share

Your right way.
The contains () function is limited to a specific element, not text in any of the child elements. Try this XPath, which you can read as follows: - get each tr / td with any subitem that contains the text "Chapter"

 tr/td[contains(*,"Chapter")]

Good luck.

+2

William Walseth Mar 10 '12 at 3:58

source share

Kirill Polishchuk · Accepted Answer · 2012-03-10T06:16:07+0000

Use this XPath:

 //td[contains(., 'Chapter')]

XPath to find a cell with specific text parsing HTML tables

More articles: