Rvest Error in open.connection (x, "rb"): timeout reached

I am trying to clear the contents of http://google.com . an error message appears.

library(rvest) html("http://google.com") 

Error in open.connection (x, "rb"):
A timeout has been reached. In addition:
Warning message: "html" is out of date.
Use 'read_html' instead.
See Help (Deprecated)

since I am using a network, this can be caused by a firewall or proxy. I am trying to use set_config but not working.

+7
r rvest
source share
3 answers

I encountered the same problem Error in open.connection(x, "rb") : Timeout was reached when working with a proxy server on an office network.

That's what worked for me

 library(rvest) url = "http://google.com" download.file(url, destfile = "scrapedpage.html", quiet=TRUE) content <- read_html("scrapedpage.html") 

Credit: fooobar.com/questions/831958 / ...

+8
source share

This is probably the problem with your read_html call (or html in your case), which incorrectly identifies itself on the server on which it is trying to extract the content, which is the default behavior. Using curl, add the user agent to the handle read_html argument so that your scraper identifies itself.

 library(rvest) library(curl) read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0"))) 
+3
source share

I ran into this problem because my VPN was turned on. Immediately after disabling it, I tried again, and he solved the problem.

0
source share

All Articles