Add a countdown column to the data.table containing the rows until a special row is found

Question

Add a countdown column to the data.table containing the rows until a special row is found

I have data.table with ordered data, and I want to add a column that tells me how many records until I get to the “special” record, which resets the countdown.

For instance:

 DT = data.table(idx = c(1,3,3,4,6,7,7,8,9), name = c("a", "a", "a", "b", "a", "a", "b", "a", "b")) setkey(DT, idx) #manually add the answer DT[, countdown := c(3,2,1,0,2,1,0,1,0)]

Gives

 > DT idx name countdown 1: 1 a 3 2: 3 a 2 3: 3 a 1 4: 4 b 0 5: 6 a 2 6: 7 a 1 7: 7 b 0 8: 8 a 1 9: 9 b 0

See how the countdown column tells me how many lines are up to the line named "b". The question is how to create this column in the code.

Please note that the key is not evenly distributed and may contain duplicates (therefore, this is not very useful in solving the problem). In general, non-b names may be different, but I could add a dummy column, which is just True / False if this requires a solution.

+6

r data.table

Corone Mar 05 '13 at 18:32

source share

3 answers

I am sure (or at least hope) that a pure "data.table" solution will be created, but at the same time you can use rle . In this case, you need to reverse the countdown, so we will use rev to change the "name" values before continuing.

 output <- sequence(rle(rev(DT$name))$lengths) makezero <- cumsum(rle(rev(DT$name))$lengths)[c(TRUE, FALSE)] output[makezero] <- 0 DT[, countdown := rev(output)] DT # idx name countdown # 1: 1 a 3 # 2: 3 a 2 # 3: 3 a 1 # 4: 4 b 0 # 5: 6 a 2 # 6: 7 a 1 # 7: 7 b 0 # 8: 8 a 1 # 9: 9 b 0

+6

A5C1D2H2I1M1N2O1R2T1 Mar 05 '13 at 18:51

source share

Here's a mixture of Josh and Ananda's solution, in this I use RLE to create the way Josh answered:

 t <- rle(DT$name) t <- t$lengths[t$values == "a"] DT[, cd := rep(t, t+1)] DT[, cd:=max(.I) - .I, by=cd]

Even better: taking advantage of the fact that only one b always (or assuming here), you can do it better:

 t <- rle(DT$name) t <- t$lengths[t$values == "a"] DT[, cd := rev(sequence(rev(t+1)))-1]

Edit: It can be seen from the OP comment that there are more than 1 b , and in such cases all b must be 0. The first step in this is to create groups where b ends after each consecutive a .

 DT <- data.table(idx=sample(10), name=c("a","a","a","b","b","a","a","b","a","b")) t <- rle(DT$name) val <- cumsum(t$lengths)[t$values == "b"] DT[, grp := rep(seq(val), c(val[1], diff(val)))] DT[, val := c(rev(seq_len(sum(name == "a"))), rep(0, sum(name == "b"))), by = grp] # idx name grp val # 1: 1 a 1 3 # 2: 7 a 1 2 # 3: 9 a 1 1 # 4: 4 b 1 0 # 5: 2 b 1 0 # 6: 8 a 2 2 # 7: 6 a 2 1 # 8: 3 b 2 0 # 9: 10 a 3 1 # 10: 5 b 3 0

+3

Arun Mar 05 '13 at 19:19

source share

Josh o'brien · Accepted Answer · 2013-03-05T19:08:57+0000

Here is another idea:

 ## Create groups that end at each occurrence of "b" DT[, cd:=0L] DT[name=="b", cd:=1L] DT[, cd:=rev(cumsum(rev(cd)))] ## Count down within them DT[, cd:=max(.I) - .I, by=cd] # idx name cd # 1: 1 a 3 # 2: 3 a 2 # 3: 3 a 1 # 4: 4 b 0 # 5: 6 a 2 # 6: 7 a 1 # 7: 7 b 0 # 8: 8 a 1 # 9: 9 b 0

Add a countdown column to the data.table containing the rows until a special row is found

More articles: