This is probably the problem with your read_html call (or html in your case), which incorrectly identifies itself on the server on which it is trying to extract the content, which is the default behavior. Using curl, add the user agent to the handle read_html argument so that your scraper identifies itself.
library(rvest) library(curl) read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))
genericgreatape
source share