You must use the Html Parser. The one that I use and works VERY well, JSoup Here you will need to start parsing the html. Also Apache Jericho is another good one.
You should get the html document using the DOM and use the JSOUP Select () method to select any tags that you would like to receive. Either by tag, id, or class.
Decision
Use the: Jsoup.connect(String url) method: Document doc = Jsoup.connect("http://example.com/").get();
This will allow you to connect to the html page using the url. And save it as a document document, through the DOM. And read from it using the selector () method.
Description
The connect (String url) method creates a new connection and gets () extracts and parses the HTML file. If an error occurs while retrieving the URL, it will throw an IOException, which you should handle accordingly.
The Connection interface is intended for a chain of methods for building specific queries:
Document doc = Jsoup.connect("http://example.com")
If you read the Jsoup documentation, you can do it.
EDIT: here is how you can use the select method
//Once the Document is retrieved above, use these selector methods to Extract the data you want by using the tags, id, or css class Elements links = doc.select("a[href]"); // a with href Elements pngs = doc.select("img[src$=.png]"); // img with src ending .png Element masthead = doc.select("div.masthead").first(); // div with class=masthead Elements resultLinks = doc.select("h3.r > a"); // direct a after h3
EDIT: with JSOUP you can use this to get attributes, text,
Document doc = Jsoup.connect("http://example.com") Element link = doc.select("a").first(); String text = doc.body().text(); // "An example link" String linkHref = link.attr("href"); // "http://example.com/" String linkText = link.text(); // "example"" String linkOuterH = link.outerHtml(); // "<a href="http://example.com"><b>example</b></a>" String linkInnerH = link.html(); // "<b>example</b>"