Rvest Error in open.connection (x, "rb"): timeout reached

Question

Rvest Error in open.connection (x, "rb"): timeout reached

I am trying to clear the contents of http://google.com . an error message appears.

library(rvest) html("http://google.com")

Error in open.connection (x, "rb"):
A timeout has been reached. In addition:
Warning message: "html" is out of date.
Use 'read_html' instead.
See Help (Deprecated)

since I am using a network, this can be caused by a firewall or proxy. I am trying to use set_config but not working.

+7

r rvest

user3267649 Oct 23 '15 at 5:54

source share

3 answers

user799188 · Answer 1 · 2017-03-03T01:46:33+0000

I encountered the same problem Error in open.connection(x, "rb") : Timeout was reached when working with a proxy server on an office network.

That's what worked for me

 library(rvest) url = "http://google.com" download.file(url, destfile = "scrapedpage.html", quiet=TRUE) content <- read_html("scrapedpage.html")

Credit: fooobar.com/questions/831958 / ...

genericgreatape · Answer 2 · 2016-08-04T16:43:50+0000

This is probably the problem with your read_html call (or html in your case), which incorrectly identifies itself on the server on which it is trying to extract the content, which is the default behavior. Using curl, add the user agent to the handle read_html argument so that your scraper identifies itself.

 library(rvest) library(curl) read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))

Brent b · Answer 3 · 2017-09-30T03:49:35+0000

I ran into this problem because my VPN was turned on. Immediately after disabling it, I tried again, and he solved the problem.

Rvest Error in open.connection (x, "rb"): timeout reached

More articles: