Any good Java HTML parsers?

I have used Cobra so far because of how easy it was, but unfortunately it had a problem with a few test cases. Does anyone offer a tried and tested library?

I tried Cobra, built in one and HTMLCleaner, with no luck.

0
source share
5 answers

Mozilla HTML Parser looks pretty interesting. By definition, it should be as good as the Gecko engine itself, which is likely to meet your needs.

+1
source

TagSoup really does a great job with crappy HTML / XHTML.

Jericho ( NekoHTML) HTML.

TagSoup Jericho: . NekoHTML: .

+4

Saxon (, , ).

+1

[ - ]

JTidy (http://jtidy.sourceforge.net/) Dave Raggett HTMLTidy. , , /.

+1

Validator.nu parser HTML5. (Mozilla HTML .)

+1
source

All Articles