G'day, I work with a large dataset with ~ 125,000 lon / lat places with a date, for records of presence / absence of species. In each place I want to find out what the weather was in each place on the date and within 3 minutes before the date. To do this, I downloaded daily meteorological data for a given weather variable (for example, maximum temperature) during the 5th period when the data was made. I have a total of 1826 raster files, all between 2-3 MB.
I planned to add all the raster files, and then extract the value from each raster (1,826) for each point. This will create a massive file that I can use to find the dates I need. This, however, is not possible because I cannot collect so many rasters. I tried to split the rasters into 500 stacks, it works, but the files it produces are about 1 GB and very slow (rows, 125,000, columns, 500). Also, when I try to cast all these files to R to create a large data frame, it does not work.
I would like to know if there is a way to work with this amount of data in R, or if there is a package that I could use to help. Can I use a package like ff? Does anyone have a suggestion on a less intensive method to do what I want to do? I thought of something like a noodle function, but have never used it before, and I'm not sure where to start.
Any help would be really great, well in advance of your time. The code I'm currently using without success is below.
Regards, Adam
library(raster)
library(rgdal)
library (maptools)
library(shapefiles)
files<- list.files(getwd(), pattern='asc')
length(files)
memory.size(4000)
memory.limit(4000)
X<-read.table(file.choose(), header=TRUE, sep=',')
SP<- SpatialPoints(cbind(X$lon, X$lat))
s1<- stack(files[1:500])
i1 <- extract( s1,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i1, file="maxt_vals_all_points_all_dates_1.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s1,i1)
s2<- stack(files[501:1000])
i2 <- extract( s2,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i2, file="maxt_vals_all_points_all_dates_2.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s2,i2)
s3<- stack(files[1001:1500])
i3 <- extract( s3,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i3, file="maxt_vals_all_points_all_dates_3.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s3,i3)
s4<- stack(files[1501:1826])
i4 <- extract( s4,SP, cellnumbers = True, layer = 1, nl =325)
write.table(i4, file="maxt_vals_all_points_all_dates_4.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s4,i4)
i1<-read.table(file.choose(),header=TRUE,sep=',')
i2<-read.table(file.choose(),header=TRUE,sep=',')
i3<-read.table(file.choose(),header=TRUE,sep=',')
i4<-read.table(file.choose(),header=TRUE,sep=',')
vals<-data.frame(X, i1, i2, i3 ,i4)
write.table(vals, file="maxt_master_lookup.csv", sep=",", row.names= FALSE, col.names= TRUE)