Is there an object in C # that makes it easy to control the HTML DOM?

If I have a line containing html from the page I just got from the HTTP Post, how can I turn this into something that allows me to easily cross the DOM?

I realized that the HtmlDocument object makes sense, but it does not have a constructor. Are there types that make it easy to control the HTML DOM?

Thanks,
Matt

+6
dom c # html-agility-pack dom-manipulation
source share
1 answer

HtmlDocument is an instance of a document that is already loaded by the WebBrowser control. So there is no ctor.

The Html Agility Pack is the best library I've used for this purpose.

Codeplex wiki example

HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]")) { HtmlAttribute att = link["href"]; att.Value = FixLink(att); } doc.Save("file.htm"); 

This example shows how to load a file, but there are overloads that allow you to load a string or stream.

+9
source share

All Articles