I have a huge dataset of 1.5 billion spatial lines that I created using all combinations of 37,000 points. For each spatial line, I would like to extract the maximum value of the polygon (or raster β no matter which is faster) that the line touches.This is essentially a very large βspatial joinβ in Arc lingo. If the overlay lines are on a polygonal layer, the output will be the maximum value of the spatial line in all attribute fields - each of which is one month of one year. I also included a raster dataset that was created only in January 1990 from a polygon file with a resolution of ~ 30 m. The raster is an alternative approach, which I thought could save time. Polygons and raster layers represent a large spatial area: approximately 30 km x 10 km. Data is available here . The spatial line dataset that I included in the .zip has only 9900 rows, sampled from the entire 1.5 billion row dataset.
Read the data first
poly<-readShapePoly("ls_polys_bin",proj4string=CRS("+proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs"))
poly$SP_ID<-NULL
raster.jan90<-readGDAL("rast_jan90.tif")
raster.jan90<-raster(raster.jan90)
lines<-readShapeLines("l_spatial",proj4string=CRS("+proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs"))
, 50
lines.50<-lines[sample(nrow(lines),50),]
plot(raster.jan90)#where green=1
plot(poly, axes=T,cex.axis=0.75, add=T)
plot(lines.50, col="red", add=TRUE)
, 1,5 844 .
ptm <- proc.time()
overlays.all<-over(lines.50,poly, fn=max)
ptm.sec.overlay<-proc.time() - ptm
ptm.sec.overlay
( - 1990 .), () , .
ptm <- proc.time()
ext.rast.jan90<-extract(raster.jan90,lines.50, fun=max, method=simple)
ptm.sec.ext<-proc.time() - ptm
ptm.sec.ext
"0" "NA" . extract() ? , "1" "0", , 0: 300.