How to stop execution of RCurl :: getURL () if it is too long?

Is there a way to tell R or the RCurl package to give up trying to load a web page if it exceeds the specified period of time and go to the next line of code? For instance:

> library(RCurl) > u = "http://photos.prnewswire.com/prnh/20110713/NY34814-b" > getURL(u, followLocation = TRUE) > print("next line") # programme does not get this far 

This will just hang on my system and not go to the last line.

EDIT: Based on @Richie Cotton's answer below, although I can “sort of” achieve what I want, I don’t understand why it takes longer than expected. For example, if I do the following, the system freezes until I select / select the option “Misc → Buffered Output” in RGUI:

 > system.time(getURL(u, followLocation = TRUE, .opts = list(timeout = 1))) Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : Operation timed out after 1000 milliseconds with 0 out of 0 bytes received Timing stopped at: 0.02 0.08 ***6.76*** 

SOLUTION: Based on the @Duncan post below, and then looking at the curl docs, I found a solution using the maxredirs option as follows:

 > getURL(u, followLocation = TRUE, .opts = list(timeout = 1, maxredirs = 2, verbose = TRUE)) 

Thank you,

Tony braial

 O/S: Windows 7 R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit) attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RCurl_1.6-4.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.13.0 
+4
source share
2 answers

I believe the web server is getting confused by telling us that the url is temporary and then it points us to the new url

http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&page=/getStoryRemapDetails.do&prnid=20110713%252fN \ Y34814% 252db & action = data

When we follow this, it redirects us again to ... the same URL !!!

Therefore, timeout is not a problem. The answer comes very quickly, so the timeout does not exceed. This is the fact that we are spinning in a circle, which causes a visible hang.

As I found this, adding verbose = TRUE to the .opts list Then we see all the communication between us and the server.

D.

+5
source

timeout and connecttimeout are curl parameters, so you need to pass them to the list with the .opts parameter before getURL . Not sure which of the two you need, but start with

 getURL(u, followLocation = TRUE, .opts = list(timeout = 3)) 

EDIT:

I can reproduce the hang; changing the buffered output does not fix it for me (verified in R2.13.0 and R2.13.1), and this happens with or without a timeout argument. If you try getURL on a page that is a redirect target, it looks blank.

 u2 <- "http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&page=/getStoryRemapDetails.do&prnid=20110713%252fNY34814%252db&action=details" getURL(u2) 

If you remove the page argument, it redirects you to the login page; perhaps Newswire PR is doing something funny asking for credentials.

 u3 <- "http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&prnid=20110713%252fNY34814%252db&action=details" getURL(u3) 
+4
source

All Articles