The number of rows in the variable, but start from the moment the condition

I want to specify some string combinations in a data frame (which is ordered by id and time)

tc <- textConnection(' id time end_yn abc 10 0 abc 11 0 abc 12 1 abc 13 0 def 10 0 def 15 1 def 16 0 def 17 0 def 18 1 ') test <- read.table(tc, header=TRUE) 

The goal is to create a new column (" number ") that prints each row for id from 1 to n until end_yn == 1 . After end_yn == 1 numbering should begin.

Without regard to the condition end_yn == 1 lines can be numbered with:

 DT <- data.table(test) DT[, id := seq_len(.N), by = id] 

However, the expected result should be:

 id time end_yn number abc 10 0 1 abc 11 0 2 abc 12 1 3 abc 13 0 1 def 10 0 1 def 15 1 2 def 16 0 1 def 17 0 2 def 18 1 3 

How to enable the condition end_yn == 1 ?

+7
source share
1 answer

I assume there are different ways to do this, but here is one:

 DT[, cEnd := c(0,cumsum(end_yn)[-.N])] # carry the end value forward DT[, number := seq_len(.N), by = "id,cEnd"] # create your sequence DT[, cEnd := NULL] # remove the column created above 

Setting id as a key for DT may be worth it.

+5
source

All Articles