Here's a quick workaround that relies heavily on what is actually going on inside (making the code a bit fragile imo). Since internally, NaN is a very very negative number, it will always be at the beginning of your data.table when you setkey . We can use this property to highlight such entries:
# this will give the index of the first element that is *not* NaN my.dt[J(-.Machine$double.xmax), roll = -Inf, which = T] # this is equivalent to my.dt[!is.nan(x)], but much faster my.dt[seq_len(my.dt[J(-.Machine$double.xmax), roll = -Inf, which = T] - 1)]
Here is an example of Ricardo trial data:
my.dt <- as.data.table(replicate(20, sample(100, 1e5, TRUE))) setnames(my.dt, 1, "ID") my.dt[sample(1e5, 1e3), ID := NA] setkey(my.dt, ID)
In my tests, the following minN function also covers symbolic and logical vectors:
minN = function(x) { if (is.integer(x)) { -.Machine$integer.max } else if (is.numeric(x)) { -.Machine$double.xmax } else if (is.character(x)) { "" } else if (is.logical(x)) { FALSE } else { NA } }
And you need to add mult = 'first' , for example:
my.dt[seq_len(my.dt[J(minN(colname)), roll = -Inf, which = T, mult = 'first'] - 1)]
source share