How to remove Unescape String from Javascript on Android?

I am trying to extract the source code of a webpage from a WebView in an Android application. I succeeded using this: http://lexandera.com/2009/01/extracting-html-from-a-webview/

plus this to get it working after KitKat:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.KITKAT) { webView.evaluateJavascript( "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();", new ValueCallback<String>() { @Override public void onReceiveValue(String html) { outputViewer.setText(html); } }); }else{ webView.loadUrl("javascript:window.HTMLOUT.showHTML" + "('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');"); } 

Now the problem is that a version other than kitkat returns exactly what I want. However, the KitKat version returns an irrevocable version of the code, something like this:

 "\u003Chtml>\u003Chead>\n\t\u003Cmeta charset=\"UTF-8\">\n\t\u003Cmeta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n\t\u003Clink rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\n\t\u003Clink rel=\"pingback\" 

Is there a direct way to unescape this line on Android?

Mike

+6
source share
1 answer

I had the same problem and it looks like it is with java-escaped since I already use apache commons lang, this worked for me:

 str = StringEscapeUtils.unescapeJava(str); 

front

 "\u003Chtml lang=\"en\">\u003Chead> \u003Cmeta content=\"width=device-width,minimum-scale=1.0\"... 

after

 "<html lang="en"><head> <meta content="width=device-width,minimum-scale=1.0"... 

I took the code from:

Converts a Unicode character back to a real character

+1
source

All Articles