1

C # HtmlAgilityPack Select table from specific h2

I have html:

<h2>Results</h2> <div class="box"> <table class="tFormat"> <th>Head</th> <tr>1</tr> </table> </div> <h2>Grades</h2> <div class="box"> <table class="tFormat"> <th>Head</th> <tr>1</tr> </table> </div> 

I was wondering how to get the table in the "Results" section

I tried:

  var nodes = doc.DocumentNode.SelectNodes("//h2"); foreach (var o in nodes) { if (o.InnerText.Equals("Results")) { foreach (var c in o.SelectNodes("//table")) { Console.WriteLine(c.InnerText); } } } 

It works, but it also gets a table under the h2 marks

+6
source share
2 answers

Note that the div is not hierarchically within the header, so it makes no sense to look for it there.

This may work for you - it finds the following element after the header:

 if (o.InnerText.Equals("Results")) { var nextDiv = o.NextSibling; while (nextDiv != null && nextDiv.NodeType != HtmlNodeType.Element) nextDiv = nextDiv.NextSibling; // nextDiv should be correct here. } 

You can also write a more specific xpath to find only this div:

 doc.DocumentNode.SelectNodes("//h2[text()='Results']/following-sibling::div[1]"); 
+5
source
  var nodes = doc.DocumentNode.SelectNodes("//h2"); if (nodes.FirstOrDefault()!=null) { var o=nodes.FirstOrDefault(); if (o.InnerText.Equals("Results")) { foreach (var c in o.SelectNodes("//table")) { Console.WriteLine(c.InnerText); } } } 
0
source

All Articles
Head