HTML parsing libraries for .NET.

Question

HTML parsing libraries for .NET.

I am looking for libraries to parse HTML to extract links, forms, tags, etc.

LGPL or any other development-friendly commercial licenses are preferred.

Do you have experience with one of these libraries? Or could you recommend another similar library?

+4

dom html .net parsing

dr. evil Mar 17 '09 at 8:03

source share

1 answer

Marc gravell · Accepted Answer · 2009-03-17T08:08:30+0000

The HTML Agility Pack contains examples of this type of thing and uses xpath for familiar queries - for example (from the home page), finding all links is simple:

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("// a@href ")) { //... }

EDIT

As of 6/19/2012, the above code, as well as the only code example shown in the HTML Flexibility Package Examples will not work. Just a little tweaking is required, as shown below.

 HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) { HtmlAttribute att = link.Attributes["href"]; att.Value = Foo(att); // fix the link } doc.Save("file.htm");

HTML parsing libraries for .NET.

More articles: