One source of speedup is writing a smaller version of ks.test , which does less. ks.test2 below is more restrictive than ks.test . It is assumed, for example, that you have no missing values ββand that you always want statistics to be associated with a two-way test.
ks.test2 <- function(x, y){ nx <- length(x) ny <- length(y) w <- c(x, y) z <- cumsum(ifelse(order(w) <= nx, 1/nx, -1/ny)) max(abs(z)) }
Make sure the output matches ks.test .
set.seed(999) x <- rnorm(400) y <- rnorm(400) ks.test(x, y)$statistic D 0.045 ks.test2(x, y) [1] 0.045
Now determine the savings from a smaller function:
library(microbenchmark) microbenchmark( ks.test(x, y), ks.test2(x, y) ) Unit: microseconds expr min lq mean median uq max neval cld ks.test(x, y) 1030.238 1070.303 1347.3296 1227.207 1313.8490 6338.918 100 b ks.test2(x, y) 709.719 730.048 832.9532 833.861 888.5305 1281.284 100 a
davechilders
source share