Open a connection with Jsoup, get a status code and analyze the document

I am creating a class using jsoup that will do the following:

  • The constructor opens a connection to the URL.
  • I have a method that checks the status of a page. i.e. 200, 404, etc.
  • I have a method for parsing a page and returning a list of URLs. #

The following is crude work on what I'm trying to do, and not very crude, as I tried many different things.

public class ParsePage { private String path; Connection.Response response = null; private ParsePage(String langLocale){ try { response = Jsoup.connect(path) .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21") .timeout(10000) .execute(); } catch (IOException e) { System.out.println("io - "+e); } } public int getSitemapStatus(){ int statusCode = response.statusCode(); return statusCode; } public ArrayList<String> getUrls(){ ArrayList<String> urls = new ArrayList<String>(); } } 

As you can see, I can get the page status, but using an already open connection to the constructor, I don’t know how to get the document for analysis, I tried to use:

 Document doc = connection.get(); 

But that does not go. Any suggestions? Or is it better to do this?

+4
source share
4 answers

As pointed out in the JSoup documentation for Connection.Response , there is a parse() method that parses the response body as a Document and returns it. When you have it, you can do whatever you want.

For example, see getUrls() implementation

 public class ParsePage { private String path; Connection.Response response = null; private ParsePage(String langLocale){ try { response = Jsoup.connect(path) .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21") .timeout(10000) .execute(); } catch (IOException e) { System.out.println("io - "+e); } } public int getSitemapStatus() { int statusCode = response.statusCode(); return statusCode; } public ArrayList<String> getUrls() { ArrayList<String> urls = new ArrayList<String>(); Document doc = response.parse(); // do whatever you want, for example retrieving the <url> from the sitemap for (Element url : doc.select("url")) { urls.add(url.select("loc").text()); } return urls; } } 
+9
source

If you do not need to log in, use:

 Document doc = Jsoup.connect("url").get(); 

If you need to log in, I would suggest using:

 Response res = Jsoup.connect("url") .data("loginField", "yourUser", "passwordField", "yourPassword") .method(Method.POST) .execute(); Document doc = res.parse(); //If you need to keep logged in to the page, use Map<String, String> cookies = res.cookies; //And by every consequent connection, you'll need to use Document pageWhenAlreadyLoggedIn = Jsoup.connect("url").cookies(cookies).get(); 

In your use for getting urls I will probably try

 Elements elems = doc.select(a[href]); for (Element elem : elems) { String link = elem.attr("href"); } 

What about it .. Keep up the good work

+6
source

You should be able to call parse () on the response object.

 Document doc = response.parse(); 
+2
source

It seems your situation is how you want to establish a connection with jsoup, then check the status code, and then according to the status code that you will analyze, or what you want to do.

To do this, you first need to check the URL status code, and then create a connection.

  Response response = Jsoup.connect("Your Url ").followRedirects(false).execute(); System.out.println(response.statusCode() + " : " + response.url()); 

response.statusCode() will return you a status code

After that you can create your own connection

  if (200 == response.statusCode()) { doc = Jsoup.connect(" Your URL").get(); Elements elements = doc.select("href"); /* what ever you want to do*/ } 

Your class will look like this

 package com.demo.soup.core; import java.io.IOException; import org.jsoup.Connection.Response; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; /** * The Class DemoConnectionWithJsoup. * * @author Ankit Sood Apr 21, 2017 */ public class DemoConnectionWithJsoup { /** * The main method. * * @param args * the arguments */ public static void main(String[] args) { Response response; try { response = Jsoup.connect("Your URL ").followRedirects(false).execute(); /* response.statusCode() will return you the status code */ if (200 == response.statusCode()) { Document doc = Jsoup.connect("Your URL").get(); /* what ever you want to do */ } } catch (IOException e) { e.printStackTrace(); } } } 
+1
source

Source: https://habr.com/ru/post/1411561/


All Articles