Open an HTML document with xml.Load

I would like to open an HTML document (as a string received from StreamReader, from the Internet), creating an XMLDocument like this:

XmlDocument doc = new XmlDocument doc.Load(string containing the retrieved document). 

But since the HTML document contains this chapter:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" > 

He tells me that the document is invalid ... Any way around this?

+4
source share
4 answers

Normal html, even if its valid html, is invalid xml.

There is a library called HtmlAgilityPack, which is a popular third-party open source library that you can use to solve this problem:

+2
source

If you are sure that HTML is valid XML, I suppose you could just replace the head with HTML XML file.

0
source

first you need to check that XHTML is a valid XHTML document (this means that it is also a valid XML document).

paste the XHTML code here and view the output. http://validator.w3.org/#validate_by_input

good luck!

0
source

You can use HTML Tidy Tidy.NET for this .

0
source

All Articles