Using xpath to select an item after another

Question

Using xpath to select an item after another

I saw similar questions, but the solutions I saw would not work on the following. I am far from XPath expert. I just need to parse the HTML. How can I select the table that follows heading 2. I thought my solution below should work, but apparently not. Can anyone help me here?

content = """<div> <p><b>Header 1</b></p> <p><b>Header 2</b><br></p> <table> <tr> <td>Something</td> </tr> </table> </div> """ from lxml import etree tree = etree.HTML(content) tree.xpath("//table/following::p/b[text()='Header 2']")

+8

xpath lxml

jseabold Oct 9 '13 at 18:29

source share

2 answers

Some alternatives to @Arup answer:

 tree.xpath("//p[b='Header 2']/following-sibling::table[1]")

select the first table sibling next to p containing heading b containing "Heading 2"

 tree.xpath("//b[.='Header 2']/following::table[1]")

select the first table in the order of the document after b containing "Title 2"

For more information on the different axes, see the XPath 1.0 specification :

the next axis contains all nodes in the same document as the node context, which after the node context are in document order, excluding any descendants and excluding node nodes and namespace nodes
the next sibling axis contains all of the following siblings of the node context; if the node context is a node attribute or node namespace, the next sibling axis is empty

+10

paul trmbrth Oct 9 '13 at 21:04

source share

Arup rakshit · Accepted Answer · 2013-10-09T18:35:17+0000

You need to use below XPATH 1.0 using axis preceding .

  //table[preceding::p[1]/b[.='Header 2']]

Using xpath to select an item after another

More articles: