Is it possible to download all zip files from a web page without specifying individual links one at a time.
I want to download all monthly zip files from http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html .
I am using Windows 8.1, R3.1.1. I donโt have wgeton the PC, so you cannot use a recursive call.
Alternative:
As a workaround, I tried to load the text of the web page itself. Then I would like to extract the name of each zip file, which can then be passed in download.filein a loop. However, I struggle with extracting the name.
pth <- "http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html"
temp <- tempfile()
download.file(pth,temp)
dat <- readLines(temp)
unlink(temp)
g <- dat[grepl("accounts_monthly", tolower(dat))]
g contains character strings with file names, among other characters.
g
[1] " <li><a href=\"Accounts_Monthly_Data-September2013.zip\">Accounts_Monthly_Data-September2013.zip (775Mb)</a></li>"
[2] " <li><a href=\"Accounts_Monthly_Data-October2013.zip\">Accounts_Monthly_Data-October2013.zip (622Mb)</a></li>"
Accounts_Monthly_Data-September2013.zip .., (. )
gsub(".*\\>(\\w+\\.zip)\\s+", "\\1", g)
g <- c(" <li><a href=\"Accounts_Monthly_Data-September2013.zip\">Accounts_Monthly_Data-September2013.zip (775Mb)</a></li>",
" <li><a href=\"Accounts_Monthly_Data-October2013.zip\">Accounts_Monthly_Data-October2013.zip (622Mb)</a></li>"
)