Overlay polygons more efficiently or extract () raster data from feature lines

I have a huge dataset of 1.5 billion spatial lines that I created using all combinations of 37,000 points. For each spatial line, I would like to extract the maximum value of the polygon (or raster β€” no matter which is faster) that the line touches.This is essentially a very large β€œspatial join” in Arc lingo. If the overlay lines are on a polygonal layer, the output will be the maximum value of the spatial line in all attribute fields - each of which is one month of one year. I also included a raster dataset that was created only in January 1990 from a polygon file with a resolution of ~ 30 m. The raster is an alternative approach, which I thought could save time. Polygons and raster layers represent a large spatial area: approximately 30 km x 10 km. Data is available here . The spatial line dataset that I included in the .zip has only 9900 rows, sampled from the entire 1.5 billion row dataset.

Read the data first

#polygons

 poly<-readShapePoly("ls_polys_bin",proj4string=CRS("+proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs"))
 poly$SP_ID<-NULL #deleting this extra field in prep for overlay

#raster - this represents only one month (january 1990)
   #raster created from polygon layer but one month only

     raster.jan90<-readGDAL("rast_jan90.tif") 
     raster.jan90<-raster(raster.jan90) #makes it into a raster

#lines (9900 of 1.5 billion included)

     lines<-readShapeLines("l_spatial",proj4string=CRS("+proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs"))

, 50

 lines.50<-lines[sample(nrow(lines),50),]

plot(raster.jan90)#where green=1
plot(poly, axes=T,cex.axis=0.75, add=T)
plot(lines.50, col="red", add=TRUE)

, 1,5 844 .

 ptm <- proc.time() #start clock
 overlays.all<-over(lines.50,poly, fn=max)
 ptm.sec.overlay<-proc.time() - ptm # stop clock
 ptm.sec.overlay #.56 sec w/ n=12 lines; 2.3 sec w/ 50 lines

( - 1990 .), () , .

 ptm <- proc.time() # Start clock
 ext.rast.jan90<-extract(raster.jan90,lines.50, fun=max, method=simple)
 ptm.sec.ext<-proc.time() - ptm # stop clock
 ptm.sec.ext #32 sec w/ n=12 lines; 191 sec w/ n=50 lines

"0" "NA" . extract() ? , "1" "0", , 0: 300.

+4
2

, . , (getCrds ), , ( , , ).

library(raster)
raster.jan90 <- raster("rast_jan90.tif") 
lines <- shapefile("l_spatial.shp", p4s="+proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs")  
lines.50<-lines[sample(nrow(lines),50),]

test <- function(lns) {

  getCrds <- function(i) {
    p <- z[[i]][[1]]
    s <- (p[2,] - p[1,]) / res(raster.jan90)
    step <- round(max(abs(s)))
    if ( step < 1 ) {
        # these probably should not exist, but they do
        return( cbind(i, cellFromXY(raster.jan90, p[1, , drop=FALSE])) )
    }
    x <- seq(p[1,1], p[2,1], length.out=step)
    y <- seq(p[1,2], p[2,2], length.out=step)
    cbind(i, unique(cellFromXY(raster.jan90, cbind(x, y))))
  }

  z <- coordinates(lns)
  crd <- sapply(1:length(z), getCrds )
  crd <- do.call(rbind, crd)

  e <- extract(raster.jan90, crd[, 2])
  tapply(e, crd[,1], max)
}

system.time(res <- test(lines.50))
#  user  system elapsed 
#  0.53    0.01    0.55 

system.time(res <- test(lines))
#  user  system elapsed 
#  59.72    0.85   60.58 

(684481500 * 60,58/ ())/(3600 * 24) 50 ...

1 50

, ( ).

+1

, , .

R. C, 37 000 , Bresenham, , ​​ , , . , . , ?

, .

1000 Amazon ( ) .

+1

All Articles