Can jsoup handle redirect meta-updates

I have a problem using jsoup, what I am trying to do is extract a document from a url that will be redirected to another url based on a meta url update that doesn't work, to clearly explain if I enter the url a website called http://www.amerisourcebergendrug.com which automatically redirects to http://www.amerisourcebergendrug.com/abcdrug/ depending on the meta refre url, but my jsoup still follows http: // www.amerisourcebergendrug.com rather than redirecting and choosing from http://www.amerisourcebergendrug.com/abcdrug/

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get(); 

I also tried using

 Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get(); 

but both do not work

Any workaround for this?

Update: Page may use meta update redirect methods

+7
source share
2 answers

Update (case insensitive and fairly fault tolerant)


 public static void main(String[] args) throws Exception { URI uri = URI.create("http://www.amerisourcebergendrug.com"); Document d = Jsoup.connect(uri.toString()).get(); for (Element refresh : d.select("html head meta[http-equiv=refresh]")) { Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+") .matcher(refresh.attr("content")); // find the first one that is valid if (m.matches()) { if (m.group(1) != null) d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get(); break; } } } 

The output is correct:

 http://www.amerisourcebergendrug.com/abcdrug/ 

Old answer:

Are you sure that it does not work. For me:

 System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri()); 

.. outputs http://www.ibm.com/us/en/ correctly.

+13
source

to better deal with errors and case sensitivity

 try { Document doc = Jsoup.connect("http://www.ibm.com").get(); Elements meta = doc.select("html head meta"); if (meta != null) { String lvHttpEquiv = meta.attr("http-equiv"); if (lvHttpEquiv != null && lvHttpEquiv.toLowerCase().contains("refresh")) { String lvContent = meta.attr("content"); if (lvContent != null) { String[] lvContentArray = lvContent.split("="); if (lvContentArray.length > 1) doc = Jsoup.connect(lvContentArray[1]).get(); } } } // get page title return doc.title(); } catch (IOException e) { e.printStackTrace(); } 
+2
source

All Articles