GetURL (from RCurl package) does not work in a loop

I have a URL list called URLlist, and I loop over it to get the source code for each of these URLs:

for (k in 1:length(URLlist)){ temp = getURL(URLlist[k]) } 

The problem is some random url, the code gets stuck and I get an error:

 Error in function (type, msg, asError = TRUE) : transfer closed with outstanding read data remaining 

But when I try to use the getURL function, and not in a loop, with the URL that had the problem, it works fine.

Any help please? thank you very much

+4
source share
1 answer

It is difficult to say for sure without additional information, but it can simply be requests sent too quickly, in which case just a pause between requests can help:

 for (k in 1:length (URLlist)) { temp = getURL (URLlist[k]) Sys.sleep (0.2) } 

I assume that your actual code does something with "temp" before writing over it at each iteration of the loop, and all that it does is very fast.

You can also try creating some error handling so that one problem does not kill all of this. Here's a crude example that tries to search every URL twice before giving up:

 for (url in URLlist) { temp = try (getURL (url)) if (class (temp) == "try-error") { temp = try (getURL (url)) if (class (temp) == "try-error") temp = paste ("error accessing", url) } Sys.sleep(0.2) } 
+3
source

All Articles