The webpage source downloaded through Jsoup is not equal to the actual webpage source

I have a serious problem. I searched everything through stack overflow and many other sites. each where they give the same solution, and I tried all this, but mi was not able to solve this problem.

I have the following code,

Document doc = Jsoup.connect(url).timeout(30000).get();

Here m, using the Jsoup library, and the result I get is not equal to the actual page source that we see, but right-click on the page -> page source. Many parts are missing from the result that I get with the above line of code. After searching some sites on Google, I saw this method,

URL url = new URL(webPage);
        URLConnection urlConnection = url.openConnection();
        urlConnection.setConnectTimeout(10000);
        urlConnection.setReadTimeout(10000);
        InputStream is = urlConnection.getInputStream();
        InputStreamReader isr = new InputStreamReader(is);



        int numCharsRead;
        char[] charArray = new char[1024];
        StringBuffer sb = new StringBuffer();
        while ((numCharsRead = isr.read(charArray)) > 0) {
            sb.append(charArray, 0, numCharsRead);
        }
        String result = sb.toString();          

        System.out.println(result);   

. , , , charSet - -. ? java. crawler4j, . , . m . , . , !

+4
3

. - . :

Document doc = Jsoup.connect(url)
                    .userAgent("Mozilla/5.0")
                    .timeout(30000)
                    .get();
+4

, - Javascript, , JSoup , HtmlUnit, Selenium : Jsoup .

UPDATE

, HTML . , - <javascript>, . -, .

JSoup , , Chrome, Firefox IE. JSoup - HTML-, .

, , -, - , , . .

- - HtmlUnit. , , : Selenium vs HtmlUnit?.

, Selenium WebDriver.

+3

- ? - , - REST API.

, -, - , , - URLConnection.

, :

  • : ( java/) URL-, - .

  • Java Script: , java script , , JavaScript , DOM.

  • , IE, Firefox Chrome, DOM.

, URL-, URLConnection, , , URL- , - javascript/.

URLConnection / JSoup will allow you to set request headers as needed, but you can still get a different answer due to points 2 and 3. Selenium allows you to remotely control your browser and has an api to access the displayed page. Selenium is used to automatically test web applications.

+1
source

All Articles