R: extract address using getURL ()

I have a ton of google urls and want to get a clean url from geocoding url. I recently found getURL () in an RCurl package that delivered me a ton of information

Library (RCurl)

getURL (" https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US ")

but all I'm really interested in is getting the address fragment located towards the end of getURL () output:

... <meta content = \ "loc: 2440 Seattle, 98116 WA US - Google Maps \" property = \ "og: title \"> ...

Update . I just realized that the above URL is a bad example, here is another example:

getURL (" https://maps.google.com/?q=loc%3A+%31%30%30%35%36+Interlake+Ave+N+seattle+WA+US ")

... <meta content = \ "loc: 10056 Interlake Ave N seattle WA US - Google Maps \" property = \ "og: title \"> ...

Does anyone have any suggestions for an effective solution to this? My preferences, I am in between with R and I will be grateful for your help. Thank!!

Tim

+4
source share
1 answer

Use the Google Maps XML API as follows:

require(XML)

burl <- "http://maps.google.com/maps/api/geocode/xml?address="
address <- "2440 Seattle, 98116 WA US"
request <- paste0(burl,URLencode(address))

doc <- htmlTreeParse(request, useInternalNodes=TRUE)
# Interpreted Adress
xmlValue(doc[["//formatted_address"]])
[1] "2440, Seattle-Tacoma International Airport (SEA), Seattle, WA 98158, USA"

EDIT
If you only have a coded URL, use URLdecodeto decode it instead of loading the URL:

URL <- "https://maps.google.com/?q=loc%3A+%32%34%34%30+Seattle%2C+%39%38%31%31%36+WA+US"
URL <- gsub(".*loc","",URL) # Get rid of https://...
URL <- URLdecode(URL)
gsub("[:]|[+]", " ", URL) # Get rid of ":" and "+"
[1] "  2440 Seattle, 98116 WA US"
+3

All Articles