Can I avoid a loop for operations inside a vector?

Is there a way to calculate the 4th column in a data table (timeout) without a for loop? Each ith row of this column uses the ith row, so it takes a lot of time to generate as I increase the number of rows.

library(data.table)
dt <- data.table(
id = 1:200, 
timein = cumsum(runif(200,1,6)),
servtime = runif(200,3,4))

dt[,"timeout"] <- dt$timein # initialisation of timeout column

# update column timeout
for(i in 2:200) {
dt$timeout[i] <- max(dt$timein[i], dt$timeout[i-1]) +  dt$servtime[i]
} 
+4
source share
1 answer

I don’t see an easy way in the R database to use vectorized operators to speed this up, but you can use Rcpp to speed up the operation:

library(Rcpp)
get.timeout <- cppFunction("
NumericVector getTimeout(NumericVector timein, NumericVector servtime) {
  const int n = timein.size();
  NumericVector timeout(n);
  timeout[0] = timein[0];
  for (int i=1; i < n; ++i) {
    timeout[i] = fmax(timein[i], timeout[i-1]) + servtime[i];
  }
  return timeout;
}")

This is faster than a solution with a for loop:

for.loop <- function(timein, servtime) {
  timeout <- dt$timein
  n <- length(timeout)
  for(i in 2:n) {
    timeout[i] <- max(timein[i], timeout[i-1]) +  servtime[i]
  }
  return(timeout)
}
all.equal(for.loop(dt$timein, dt$servtime), get.timeout(dt$timein, dt$servtime))
# [1] TRUE
library(microbenchmark)
microbenchmark(for.loop(dt$timein, dt$servtime), get.timeout(dt$timein, dt$servtime))
# Unit: microseconds
#                                 expr     min       lq      mean   median       uq     max neval
#     for.loop(dt$timein, dt$servtime) 414.040 429.5315 438.68765 435.4000 445.1185 506.162   100
#  get.timeout(dt$timein, dt$servtime)  22.432  23.9305  28.54934  27.9135  28.6670  97.259   100

The advantage is likely to increase for large entrances.

+3
source

All Articles