Efficiently interpolate values ​​from the grid in R

I have a grid of ocean depth data by location, and I'm trying to interpolate depth values ​​to select GPS points.

We used RSAGA :: pick.from.points, which is great for small datasets.

require(RSAGA) depthdata <- cbind.data.frame(x=c(74.136, 74.135, 74.134, 74.133, 74.132, 74.131, 74.130, 74.129, 74.128, 74.127), y=rep(40, times=10), depth=c(-0.6, -0.6, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.6, -0.6)) mylocs <- rbind(c(-74.1325, 40), c(-74.1305, 40)) colnames(mylocs) <- c("x", "y") results <- pick.from.points(data=mylocs, src=depthdata, pick=c("depth"), method="nearest.neighbour") mydepths <- results$depth 

But our depth dataset contains 69 million data points, and we have 5 million GPS points that we need to estimate the depth, and pick.from points just drag out (> 2 weeks) for this dataset. We believe that we could It would be faster to complete this task in MATLAB or ArcMap, but we are trying to include this task in a longer workflow in R, which we write for other people to work multiple times, so switching to proprietary software for the part that the workflow is less desirable.

We would like to sacrifice some degree of accuracy for speed.

I looked for the solution as best as possible, but I'm pretty new to grid and interpolation data, so I might use the wrong language and therefore skip the simple solution.


+4
source share
1 answer

If you were ready to impose, finding the closest neighbor and using its value, I think the trick would be to use an efficient implementation of nearest neighbors, which allows you to find the nearest neighbor among n alternatives in O (log (n)) time. The kd tree provides such performance and is accessible through the FNN package in R. Although the calculation (on randomly generated data with 69 million data points for reference and 5 million data points for input) is not instantaneous (it takes 3 minutes), it is much faster. than in 2 weeks!

 data <- cbind(x=rnorm(6.9e7), y=rnorm(6.9e7)) labels <- rnorm(6.9e7) query <- cbind(x=rnorm(5e6), y=rnorm(5e6)) library(FNN) get.nn <- function(data, labels, query) { nns <- get.knnx(data, query, k=1) labels[nns$nn.index] } system.time(get.nn(data, labels, query)) # user system elapsed # 174.975 2.236 177.617 

As a warning, the process reached about 10 GB of RAM, so you will need significant memory resources to work in a data set of your size.

+5
source

All Articles