Any good Java HTML parsers?

Question

Any good Java HTML parsers?

I have used Cobra so far because of how easy it was, but unfortunately it had a problem with a few test cases. Does anyone offer a tried and tested library?

I tried Cobra, built in one and HTMLCleaner, with no luck.

0

java html xpath

Legend Nov 26 '09 at 23:37

source share

5 answers

TagSoup really does a great job with crappy HTML / XHTML.

Jericho ( NekoHTML) HTML.

TagSoup Jericho: . NekoHTML: .

+4

Pascal Thivent 27 . '09 0:53

Saxon (, , ).

+1

Jim Garrison 26 . '09 23:57

[ - ]

JTidy (http://jtidy.sourceforge.net/) Dave Raggett HTMLTidy. , , /.

+1

peter.murray.rust 28 . '09 6:47

Validator.nu parser HTML5. (Mozilla HTML .)

+1

Ms2ger Nov 28 '09 at 13:51

source share

Pavel minaev · Accepted Answer · 2009-11-27T00:11:07+0000

Mozilla HTML Parser looks pretty interesting. By definition, it should be as good as the Gecko engine itself, which is likely to meet your needs.

Any good Java HTML parsers?

More articles: