HTML parsing libraries for .NET.

I am looking for libraries to parse HTML to extract links, forms, tags, etc.

LGPL or any other development-friendly commercial licenses are preferred.

Do you have experience with one of these libraries? Or could you recommend another similar library?

+4
source share
1 answer

The HTML Agility Pack contains examples of this type of thing and uses xpath for familiar queries - for example (from the home page), finding all links is simple:

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("// a@href ")) { //... } 

EDIT

As of 6/19/2012, the above code, as well as the only code example shown in the HTML Flexibility Package Examples will not work. Just a little tweaking is required, as shown below.

 HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) { HtmlAttribute att = link.Attributes["href"]; att.Value = Foo(att); // fix the link } doc.Save("file.htm"); 
+10
source

All Articles