It took me more time than I would like to admit, but here is my solution.
Since you said you want to use it on large datasets (speed does matter), I use Rcpp to write a loop that does all the checking. For speed comparison, I also create another sample of data with 500,000 data.points and check the speed (I tried to compare with other data sets, but could not transfer them to data.table (without this, it would be an unfair comparison ...)). If this is done, I am happy to update the speed of comparison!
Part 1: My decision
My solution looks like this:
(in length_time.cpp )
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector length_time(NumericVector time, NumericVector v) { double start = 0; double time_i, v_i; bool last_positive = v[0] > 0; bool last_negative = v[0] < 0; int length_i = time.length(); NumericVector ret_vec(length_i); for (int i = 0; i < length_i; ++i) { time_i = time[i]; v_i = v[i]; if (v_i == 0) { // injection if (i > 0) { // if this is not the beginning, then a regime has ended! ret_vec[i - 1] = time_i - start; start = time_i; } } else if ((v_i > 0 && last_negative) || (v_i < 0 && last_positive)) { ret_vec[i - 1] = (time_i + time[i - 1]) / 2 - start; start = (time_i + time[i - 1]) / 2; } last_positive = v_i > 0; last_negative = v_i < 0; } ret_vec[length_i - 1] = time[length_i - 1] - start; // ret_vec now only has the values for the last observation // do something like a reverse na_locf... double tmp_val = ret_vec[length_i - 1]; for (int i = length_i - 1; i >= 0; --i) { if (v[i] == 0) { ret_vec[i] = 0; } else if (ret_vec[i] == 0){ ret_vec[i] = tmp_val; } else { tmp_val = ret_vec[i]; } } return ret_vec; }
and then in the R file (i.e. length_time.R ):
library(Rcpp) # setwd("...") #to find the .cpp-file sourceCpp("length_time.cpp") dat$Length <- length_time(dat$Time, dat$V) dat # Time V Length # 1 0.5 -2 1.50 # 2 1.0 -1 1.50 # 3 1.5 0 0.00 # 4 2.0 2 1.00 # 5 2.5 0 0.00 # 6 3.0 1 1.75 # 7 3.5 2 1.75 # 8 4.0 1 1.75 # 9 4.5 -1 0.75 # 10 5.0 -3 0.75
Which seems to work with a sample dataset.
Part 2: Speed Testing
library(data.table) library(microbenchmark) n <- 10000 set.seed(1235278) dt <- data.table(time = seq(from = 0.5, by = 0.5, length.out = n), v = cumsum(round(rnorm(n, sd = 1)))) dt[, chg := v >= 0 & shift(v, 1, fill = 0) <= 0] plot(dt$time, dt$v, type = "l") abline(h = 0) for (i in dt[chg == T, time]) abline(v = i, lty = 2, col = "red")
This leads to a dataset with 985 observations (intersections).

Speed testing with micro detection results in
microbenchmark(dt[, length := length_time(time, v)]) # Unit: milliseconds # expr min lq mean median uq max neval # dt[, `:=`(length, length_time(time, v))] 2.625714 2.7184 3.054021 2.817353 3.077489 5.235689 100
3 millisecond result for calculation with 500,000 observations.
Does this help you?