How can you parse HTML in android?

I am making an android application, and an element of the functionality of the application is the return of Internet search results in the library directory. The application should display search results that are executed using a custom HTML form, in accordance with the rest of the application. That is, the search results need to be parsed and displayed useful elements. I'm just wondering if / how to do this in Android?

+4
source share
3 answers

You can use XmlPullParser for parsing XML.

For example, refer to http://developer.android.com/reference/org/xmlpull/v1/XmlPullParser.html

+4
source

You must use the Html Parser. The one that I use and works VERY well, JSoup Here you will need to start parsing the html. Also Apache Jericho is another good one.

You should get the html document using the DOM and use the JSOUP Select () method to select any tags that you would like to receive. Either by tag, id, or class.

Decision

Use the: Jsoup.connect(String url) method: Document doc = Jsoup.connect("http://example.com/").get(); 

This will allow you to connect to the html page using the url. And save it as a document document, through the DOM. And read from it using the selector () method.

Description

The connect (String url) method creates a new connection and gets () extracts and parses the HTML file. If an error occurs while retrieving the URL, it will throw an IOException, which you should handle accordingly.

The Connection interface is intended for a chain of methods for building specific queries:

  Document doc = Jsoup.connect("http://example.com") 

If you read the Jsoup documentation, you can do it.

EDIT: here is how you can use the select method

  //Once the Document is retrieved above, use these selector methods to Extract the data you want by using the tags, id, or css class Elements links = doc.select("a[href]"); // a with href Elements pngs = doc.select("img[src$=.png]"); // img with src ending .png Element masthead = doc.select("div.masthead").first(); // div with class=masthead Elements resultLinks = doc.select("h3.r > a"); // direct a after h3 

EDIT: with JSOUP you can use this to get attributes, text,

 Document doc = Jsoup.connect("http://example.com") Element link = doc.select("a").first(); String text = doc.body().text(); // "An example link" String linkHref = link.attr("href"); // "http://example.com/" String linkText = link.text(); // "example"" String linkOuterH = link.outerHtml(); // "<a href="http://example.com"><b>example</b></a>" String linkInnerH = link.html(); // "<b>example</b>" 
+14
source

Because search results are HTML and HTML is markup language (ML), you can use Android XmlPullParser to parse the results.

0
source

All Articles