This is a sample HTML that I am trying to parse using the Html Agility Pack in ASP.Net (C #).
<div class="content-div">
<dl>
<dt>
<b><a href="1.html" title="1">1</a></b>
</dt>
<dd> First Entry</dd>
<dt>
<b><a href="2.html" title="2">2</a></b>
</dt>
<dd> Second Entry</dd>
<dt>
<b><a href="3.html" title="3">3</a></b>
</dt>
<dd> Third Entry</dd>
</dl>
</div>
The values I want are:
- Hyperlink → 1.html
- Binding Text → 1
- Internal text od dd → First post
(I gave examples of the first entry here, but I need values for these elements for all entries in the list)
This is the code I'm currently using,
var webGet = new HtmlWeb();
var document = webGet.Load(url2);
var parsedValues=
from info in document.DocumentNode.SelectNodes("//div[@class='content-div']")
from content in info.SelectNodes("dl//dd")
from link in info.SelectNodes("dl//dt/b/a")
.Where(x => x.Attributes.Contains("href"))
select new
{
Text = content.InnerText,
Url = link.Attributes["href"].Value,
AnchorText = link.InnerText,
};
GridView1.DataSource = parsedValues;
GridView1.DataBind();
The problem is that I get the values for the link and the anchor text correctly, but for the inner text, it just takes the value of the first record and fills the same value for all the other records for the total number of times the element and then the second starts. In my explanation, I cannot be so clear, so here is a sample output that I get with this code:
First Entry 1.html 1
First Entry 2.html 2
First Entry 3.html 3
Second Entry 1.html 1
Second Entry 2.html 2
Second Entry 3.html 3
Third Entry 1.html 1
Third Entry 2.html 2
Third Entry 3.html 3
First Entry 1.html 1
Second Entry 2.html 2
Third Entry 3.html 3
HAP xpath, , - , , . .