I am trying to create an application to clear the contents of several pages on a site. I am using JSoup to connect. This is my code:
for (String locale : langList){ sitemapPath = sitemapDomain+"/"+locale+"/"+sitemapName; try { Document doc = Jsoup.connect(sitemapPath) .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21") .timeout(10000) .get(); Elements element = doc.select("loc"); for (Element urls : element) { System.out.println(urls.text()); } } catch (IOException e) { System.out.println(e); } }
Everything works fine in most cases. However, there are a few things I want to do.
Firstly, sometimes the status 404 or 500 is returned, maybe 301. With my code below, it just prints an error and moves to the next URL. What I would like to do is try to return the url status for all links. If the page connects, print 200 if you do not print the corresponding status code.
Secondly, I sometimes catch this error βjava.net.SocketTimeoutException:β Listening. βI could increase the wait time, but I would rather try to connect 3 times, after the third time I want to add the URL to "fail", so I can retry failed connections in the future.
Can someone with more knowledge than me help me?
java jsoup connection
Peck3277
source share