What is the best html parser for java?

Assuming we should use java, what is the best html parser that is flexible for parsing a lot of different html content, and also doesn't require much code for complex types of parsing?

+6
java html parsing
source share
2 answers

I would recommend Jsoup for this. It has a very nice jQuery- enabled API like CSS selector and iteration without verbal elements . To take a copy of this answer as an example, here you can print your own question and the name of all respondents:

URL url = new URL("https://stackoverflow.com/questions/3121136"); Document document = Jsoup.parse(url, 3000); String question = document.select("#question .post-text").text(); System.out.println("Question: " + question); Elements answerers = document.select("#answers .user-details a"); for (Element answerer : answerers) { System.out.println("Answerer: " + answerer.text()); } 

An alternative would be XPath , but JSoup is more useful for web developers who are already well versed in CSS selectors.

+10
source share

The best would be the one who would do his job correctly.

There is an open source called tagsoup as well as jTidy

+1
source share

All Articles