Using xpath to select an item after another

I saw similar questions, but the solutions I saw would not work on the following. I am far from XPath expert. I just need to parse the HTML. How can I select the table that follows heading 2. I thought my solution below should work, but apparently not. Can anyone help me here?

content = """<div> <p><b>Header 1</b></p> <p><b>Header 2</b><br></p> <table> <tr> <td>Something</td> </tr> </table> </div> """ from lxml import etree tree = etree.HTML(content) tree.xpath("//table/following::p/b[text()='Header 2']") 
+8
xpath lxml
source share
2 answers

You need to use below XPATH 1.0 using axis preceding .

  //table[preceding::p[1]/b[.='Header 2']] 
+8
source share

Some alternatives to @Arup answer:

 tree.xpath("//p[b='Header 2']/following-sibling::table[1]") 

select the first table sibling next to p containing heading b containing "Heading 2"

 tree.xpath("//b[.='Header 2']/following::table[1]") 

select the first table in the order of the document after b containing "Title 2"

For more information on the different axes, see the XPath 1.0 specification :

  • the next axis contains all nodes in the same document as the node context, which after the node context are in document order, excluding any descendants and excluding node nodes and namespace nodes

  • the next sibling axis contains all of the following siblings of the node context; if the node context is a node attribute or node namespace, the next sibling axis is empty

+10
source share

All Articles