XPath to find a cell with specific text parsing HTML tables

Hope someone out there can quickly point me in the right direction with my XPath difficulties.

Currently, I have come to the point that I am defining the correct table that I need in my source HTML file, but then I only need to process the lines with the text "Head" somewhere in the DOM.

My last attempt was to do this:

// get the correct table HtmlTable table = page.getFirstByXPath("//table[2]"); // now the failing bit.... def rows = table.getByXPath("*/td[contains(text(),'Chapter')]") 

I thought that the xpath view above would represent, get all the elements that have the next child element 'td', which somewhere in its dom contains the text β€œChapter”

An example of a suitable line from my source:

 <tr valign="top"> <td nowrap="" align="Right"> <font face="Verdana"> <a href="index.cfm?a=1">Chapter 1</a> </font> </td> <td class="ChapterT"> <font face="Verdana">DEFINITIONS</font> </td> <td>&nbsp;</td> </tr> 

Any help / pointers are greatly appreciated.

Thanks,

+8
xml xpath groovy htmlunit
source share
3 answers

Use this XPath:

 //td[contains(., 'Chapter')] 
+12
source share

You want all td under your current node - not - everything in the document , as the currently accepted answer selects .

Using

 .//td[.//text()[contains(., 'Chapter')]] 

All td descendants of the current node are highlighted here, which have the name td , which have at least one text node descendant, the string value of which contains the string "Chapter" .

If you know in advance that any td in this table has only one node text, this can be simplified to :

 .//td[contains(., 'Chapter')] 
+7
source share

Your right way.
The contains () function is limited to a specific element, not text in any of the child elements. Try this XPath, which you can read as follows: - get each tr / td with any subitem that contains the text "Chapter"

 tr/td[contains(*,"Chapter")] 

Good luck.

+2
source share

All Articles