Scrambling dynamically generated html inside an Android application

I am currently writing an Android application that, among other things, uses textual information from sites that I donโ€™t have. In addition, some pages require authentication.

For some pages, I was able to log in and get the html code using BasicNameValuePairs and HTTPClient with the objects associated with it.

Unfortunately, these methods extract the source of the web page without performing any javascript functions that usually launch the browser (Android Webview). I need text that some of these scripts extract.

I did my research, but all I found was speculation and extremely confusing. I'm fine, ignoring pages that require a login. In addition, I am ready to publish any code that may be useful for constructing a solution; This is an independent project.

Any specific solutions to clear the html result from javascript calls? The example will be absolutely first-class.

+4
source share
2 answers

The above solutions are very slow and limit you to 1 url (well, actually, but I dare you to clear 10 URLs from Rhino while your user is looking forward to the results).

An alternative is to use a cloud cleaning solution. You get the benefit of not spending money on your phoneโ€™s bandwidth when downloading content that you wonโ€™t use.

Try this solution: Bobik Java SDK

It gives you the ability to scrap up to hundreds of sites in seconds.

+4
source

Ultimate Success:

Other things I've tried:

  • HttpClient provided by Android
    • Unable to run javascript
  • Htmlunit
    • 4 hours, without success. Also huge, added 12 mb to my apk.
  • SL4A
    • Finally compiled. Use THIS setup guide. Abandoned as excess for a simple rhino can.

Things that may work:

  • Selenium

Further results will be published. Other results will be added if published.

Note. Many of the options listed above refer to each other. I think rhino is included in both sl4a and htmlunit. Also, I think htmlunit contains selenium.

+7
source

All Articles