Removing text with java html parsers

I want to use an html parser that makes this a beautiful, elegant way

  • Extract text (this is most important)
  • Extract links, meta keywords
  • Restore original document (optional, but nice feature)

From my research so far, jericho seems to fit. Any other open source libraries that you guys would recommend?

0
source share
3 answers

HtmlCleaner CyberNekoHtml. CyberNekoHtml - DOM/SAX, . HtmlCleaner , .

CyberNekoHtml. CyberNekoHtml , . , . , DOM HTML, .

java html : http://java-source.net/open-source/html-parsers

+2

JSoup.

, .

.

+1

I ended up using the HtmlCleaner http://htmlcleaner.sourceforge.net/ for something like this. It is really easy to use and was fast for what I needed.

0
source

All Articles