Working with lots of data and lots of rasters in R?

Question

Working with lots of data and lots of rasters in R?

G'day, I work with a large dataset with ~ 125,000 lon / lat places with a date, for records of presence / absence of species. In each place I want to find out what the weather was in each place on the date and within 3 minutes before the date. To do this, I downloaded daily meteorological data for a given weather variable (for example, maximum temperature) during the 5th period when the data was made. I have a total of 1826 raster files, all between 2-3 MB.

I planned to add all the raster files, and then extract the value from each raster (1,826) for each point. This will create a massive file that I can use to find the dates I need. This, however, is not possible because I cannot collect so many rasters. I tried to split the rasters into 500 stacks, it works, but the files it produces are about 1 GB and very slow (rows, 125,000, columns, 500). Also, when I try to cast all these files to R to create a large data frame, it does not work.

I would like to know if there is a way to work with this amount of data in R, or if there is a package that I could use to help. Can I use a package like ff? Does anyone have a suggestion on a less intensive method to do what I want to do? I thought of something like a noodle function, but have never used it before, and I'm not sure where to start.

Any help would be really great, well in advance of your time. The code I'm currently using without success is below.

Regards, Adam

library(raster)
library(rgdal)
library (maptools)
library(shapefiles)

# To create weather data files, first set the working directory to the appropriate location (i.e., maxt)
# list of raster weather files
files<- list.files(getwd(), pattern='asc')
length(files)

memory.size(4000)  
memory.limit(4000)

# read in lon/lat data
X<-read.table(file.choose(), header=TRUE, sep=',')
SP<- SpatialPoints(cbind(X$lon, X$lat)) 

#separate stacks into mannageable sizes
s1<- stack(files[1:500])
i1 <- extract( s1,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i1, file="maxt_vals_all_points_all_dates_1.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s1,i1)
s2<- stack(files[501:1000])
i2 <- extract( s2,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i2, file="maxt_vals_all_points_all_dates_2.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s2,i2)
s3<- stack(files[1001:1500])
i3 <- extract( s3,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i3, file="maxt_vals_all_points_all_dates_3.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s3,i3)
s4<- stack(files[1501:1826])
i4 <- extract( s4,SP, cellnumbers = True, layer = 1, nl =325)
write.table(i4, file="maxt_vals_all_points_all_dates_4.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s4,i4)

# read files back in to bind into final file !!! NOT WORKING FILES ARE TOO BIG!!
i1<-read.table(file.choose(),header=TRUE,sep=',')
i2<-read.table(file.choose(),header=TRUE,sep=',')
i3<-read.table(file.choose(),header=TRUE,sep=',')
i4<-read.table(file.choose(),header=TRUE,sep=',')

vals<-data.frame(X, i1, i2, i3 ,i4)
write.table(vals, file="maxt_master_lookup.csv", sep=",", row.names= FALSE, col.names= TRUE)

+5

memory management r large-files raster spatial

Adam Apr 7 '12 at 0:14

source share

2 answers

mdsumner · Answer 1 · 2012-04-07T07:01:56+0000

I would extract one raster file at a time and add the results to the file as it arrives.

, , ( ), "[[" , .

files <- list(volcano, volcano * 2, volcano * 3)
library(sp)
SP <- SpatialPoints(structure(c(0.455921585146703, 0.237608166502031, 0.397704673508124, 0.678393354622703, 0.342820219769366, 0.554888036966903, 0.777351335399613, 0.654684656824567), .Dim = c(4L, 2L)))

library(raster)
for (i in seq_len(length(files))) {

    r <- raster(files[[i]])
    e <- extract(r, SP)
    ## print(e)  ## print for debugging
    write.table(data.frame(file = i, extract = e),"cellSummary.csv", col.names = i == 1, append = i > 1, sep = ",", row.names = FALSE)
}

mmann1123 · Answer 2 · 2016-04-11T23:44:12+0000

. . .

350 , , 32 16- linux. , -!

 # Define Functions
  extract_value_point_polygon = function(point_or_polygon, raster_stack, num_workers){
          # Returns list containing values from locations of spatial points or polygons
          lapply(c('raster','foreach','doParallel'), require, character.only = T)
          registerDoParallel(num_workers)
          ply_result = foreach(j = 1:length(point_or_polygon),.inorder=T) %do%{
                print(paste('Working on feature: ',j,' out of ',length(point_or_polygon)))
                get_class= class(point_or_polygon)[1]
                if(get_class=='SpatialPolygons'|get_class=='SpatialPolygonsDataFrame'){
                    cell = as.numeric(na.omit(cellFromPolygon(raster_stack, point_or_polygon[j], weights=F)[[1]]))}
                if(get_class=='SpatialPointsDataFrame'|get_class=='SpatialPoints'){
                    cell = as.numeric(na.omit(cellFromXY(raster_stack, point_or_polygon[j,])))}
                if(length(cell)==0)return(NA)
                r = rasterFromCells(raster_stack, cell,values=F)
                result = foreach(i = 1:dim(raster_stack)[3],.packages='raster',.inorder=T) %dopar% {
                   crop(raster_stack[[i]],r)
                }
                result=as.data.frame(getValues(stack(result)))
                return(result)
          }
          endCluster()
          return(ply_result)
  }

Working with lots of data and lots of rasters in R?

More articles: