1acceding

Parsing a Table Using the Html Agility Pack

I have a table

<table> <tr class="odd"> <td class="ind gray">1</td> <td><b>acceding</b></td> <td class="transcr">[Ι™ksˈiːdΙͺΕ‹]</td> <td class="tran"></td> </tr> <!-- .... --> <tr class="odd"> <td class="ind gray">999</td> <td><b>related</b></td> <td class="transcr">[rΙͺlˈeΙͺːtΙͺd]</td> <td class="tran"></td> </tr> </table> 

I want to parse three "td" in one line. My code

 Dictionary<string, Word> words = new Dictionary<string, Word>(); string text = webBrowser1.DocumentText; HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(text); for (int i = 0; i < doc.DocumentNode.SelectNodes("//tr").Count; i++) { HtmlNode node = doc.DocumentNode.SelectNodes("//tr")[i]; Word word = null; if (TryParseWord(node, out word)) { try { if (!words.ContainsKey(word.eng)) { words.Add(word.eng, word); } } catch { continue; } } } 

And a function to parse

 private bool TryParseWord(HtmlNode node, out Word word) { word = null; try { var eng = node.SelectNodes("//td")[1].InnerText; var trans = node.SelectNodes("//td")[2].InnerText; var rus = node.SelectNodes("//td")[3].InnerText; word = new Word(); word.eng = eng; word.rus = rus; word.trans = trans; return true; } catch { word = null; return false; } } 

In my TryParseWord method, I only have a value from the first line. How to solve this problem?

+4
source share
2 answers

I can easily get the values ​​this way

  HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); var table = doc.DocumentNode .Descendants("tr") .Select(n => n.Elements("td").Select(e => e.InnerText).ToArray()); 

And use:

 foreach (var tr in table) { Console.WriteLine("{0} {1} {2} {3}", tr[0], tr[1], tr[2], tr[3]); } 
+8
source

You need to modify XPath so that it does not match the start. Like this:

 node.SelectNodes(".//td")[1] 

The dot indicates that XPath matches only the current node.

+3
source

Source: https://habr.com/ru/post/1412271/


All Articles