Base R approach can be
df <- the.df[order(the.df$x, the.df$n),] df$val <- ave(df$val, df$x, FUN=fun)
As for fun , @DavidArenburg's answer on plain R and written more poetically could be
fun0 <- function(v) { idx <- which.max(v[2:length(v)] == 0L) + 1L if (length(idx)) v[idx:length(v)] <- 0L v }
It seems like a good idea to first formulate the solution as an independent function, because then it is easy to verify. fun0 fails for some edge cases, e.g.
> fun0(0) [1] 0 0 0 > fun0(1) [1] 0 0 0 > fun0(c(1, 1)) [1] 1 0
Best version
fun1 <- function(v) { tst <- tail(v, -1) == 0L if (any(tst)) { idx <- which.max(tst) + 1L v[idx:length(v)] <- 0L } v }
And even better, following @Arun
fun <- function(v) if (length(v) > 2) c(v[1], cummin(v[-1])) else v
It is competitive (in order of magnitude) with the solution data.table, with ordering and return, occurring in less than 1 s for data in the range of ~ 10 m. Timing diagrams @ m-dz. For millions of lines per second, you should not continue further optimization.
However, when there is a very large number of small groups (for example, 2M of each of 5), the improvement is to avoid calling the tapply() function using the group identifier to compensate for the minimum. For example,
df <- df[order(df$x, df$n),] grp <- match(df$x, unique(df$x))