I had a small little package to clean up Google Ngram data , but I found that they switched to SSL and my package broke. If I switch from readLinesto getURL, there will be some way, but some of the included scripts on the page are missing. Do I need to be interested in user agents or anything else?
Here is what I have tried so far (quite simple):
library(RCurl)
myurl <- "https://books.google.com/ngrams/graph?content=hacker&year_start=1950&year_end=2000"
getURL(myurl)
Comparing the results with viewing the source after entering the URL in the browser shows that the key content is not in the results returned to R. In the browser, the source contains content that looks like this:
<script type="text/javascript">
var data = [{"ngram": "hacker", "type": "NGRAM", "timeseries": [9.4930387994907051e-09,
1.1685493106483591e-08, 1.0784501440023556e-08, 1.0108472218003532e-08,
and etc.
Any suggestions would be greatly appreciated!