How to get img / src or a / hrefs using the Html Agility Pack?

I want to use the HTML flexibility package to parse href images and href links from an HTML page, but I just don't know much about XML or XPath. Despite finding help documents on many websites, I just can’t solve the problem. Also, I use C # in VisualStudio 2005. And I just can't speak English fluently, so I sincerely thank you for writing useful codes.

+9
source share
5 answers

the first example on the home page does something very similar, but think about it:

HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); // would need doc.LoadHtml(htmlSource) if it is not a file foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"]) { string href = link["href"].Value; // store href somewhere } 

So you can imagine that for img @src just replace each a with img and href with src . You can even simplify:

  foreach(HtmlNode node in doc.DocumentElement .SelectNodes("//a/@href | //img/@src") { list.Add(node.Value); } 

For relative URL handling, look at the Uri class.

+22
source

The example and accepted answer are incorrect. It does not compile with the latest version. I try something else:

  private List<string> ParseLinks(string html) { var doc = new HtmlDocument(); doc.LoadHtml(html); var nodes = doc.DocumentNode.SelectNodes("//a[@href]"); return nodes == null ? new List<string>() : nodes.ToList().ConvertAll( r => r.Attributes.ToList().ConvertAll( i => i.Value)).SelectMany(j => j).ToList(); } 

This works for me.

+6
source

Maybe I'm too late here to post an answer. The following worked for me:

 var MainImageString = MainImageNode.Attributes.Where(i=> i.Name=="src").FirstOrDefault(); 
+1
source

You must also consider the URL element of the base document ( <base> ) and the relative protocol URLs (for example, //www.foo.com/bar/ ).

For more information check:

0
source
 var htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html); string name = htmlDoc.DocumentNode .SelectNodes("//td/input") .First() .Attributes["value"].Value; 

Source: https://html-agility-pack.net/select-nodes

0
source

All Articles