Can I use HtmlAgilityPack to split an HTML document into a specific tag?

For example, I have a bunch of <tr> tags that I would like to collect. I need to break each of these tags into separate elements in order to simplify parsing on my part.

Is it possible?

Layout Example:

 <tr class="first-in-year"> <td class="year">2011</td> <td class="img"><a href="/battlefield-3/61-27006/"><img src= "http://media.giantbomb.com/uploads/6/63038/1700748-bf3_thumb.jpg" alt=""></a></td> <td class="title"> <a href="/battlefield-3/61-27006/">Battlefield 3</a> <p class="deck">Battlefield 3 is DICE next installment in the franchise and will be on PC, PS3 and Xbox 360. The game will feature jets, prone, a single-player and co-op campaign, and 64-player multiplayer (on PC). It due out in Fall of 2011.</p> </td> <td class="date">Expected: Q4 2011</td> <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/xbox-360/60-20/" class= "X360">X360</a>, <a href="/playstation-3/60-35/" class="PS3">PS3</a></td> </tr> <tr> <td class="year"></td> <td class="img"><a href="/forza-motorsport-4/61-33400/"><img src= "http://media.giantbomb.com/uploads/0/1992/1654849-forza4_thumb.jpg" alt= ""></a></td> <td class="title"> <a href="/forza-motorsport-4/61-33400/">Forza Motorsport 4</a> <p class="deck">The next installment of Turn 10 racing franchise slated for release in Fall 2011. It is set to feature 16 player online races, dynamic race conditions, cars from over 80 manufacturers, and compatibility with Kinect, both on and off the racetrack.</p> </td> <td class="date">Expected: Oct 2011</td> <td><a href="/xbox-360/60-20/" class="X360">X360</a></td> </tr> <tr> <td class="year"></td> <td class="img"><a href="/max-payne-3/61-23398/"><img src= "http://media.giantbomb.com/uploads/0/1400/938434-custom_1237811317319_mp3_poster_thumb.jpg" alt=""></a></td> <td class="title"> <a href="/max-payne-3/61-23398/">Max Payne 3</a> <p class="deck">The long awaited third instalment in Remedy beloved series, in which an aging Max Payne faces one final chance to redeem himself.</p> </td> <td class="date">Expected: 2011</td> <td><a href="/pc/60-94/" class="PC">PC</a>, <a href="/playstation-3/60-35/" class= "PS3">PS3</a>, <a href="/xbox-360/60-20/" class="X360">X360</a></td> </tr> 

So, for this example, I would have three elements. :)

+4
source share
1 answer

You cannot split it into multiple HTML documents per tag, if that is what you mean. You can select individual TD elements and analyze them separately.

The XPath //td selector selects all the elements that you can pass to the parsing method.

 HtmlAgilityPack.HtmlDocument doc = LoadHtmlHowever(); doc.DocumentNode.SelectNodes("//td"); 
+2
source

All Articles