I have used Cobra so far because of how easy it was, but unfortunately it had a problem with a few test cases. Does anyone offer a tried and tested library?
I tried Cobra, built in one and HTMLCleaner, with no luck.
Mozilla HTML Parser looks pretty interesting. By definition, it should be as good as the Gecko engine itself, which is likely to meet your needs.
TagSoup really does a great job with crappy HTML / XHTML.
Jericho ( NekoHTML) HTML.
TagSoup Jericho: . NekoHTML: .
Saxon (, , ).
[ - ]
JTidy (http://jtidy.sourceforge.net/) Dave Raggett HTMLTidy. , , /.
Validator.nu parser HTML5. (Mozilla HTML .)