Subtask data in first instance in R

Data:

row AB 1 1 1 2 1 1 3 1 2 4 1 3 5 1 1 6 1 2 7 1 3 

Hello to all! What I'm trying to do (example above) is to sum these values ​​in column A, but only when column B = 1 (so starting with a simple subset below).

 sum(data$A[data$B==1]) 

However, I want to do this only first , when this condition occurs until the values ​​pass. If this condition is repeated later in the column (line 5 in the example), I'm not interested in that!

I am very grateful for your help in this (I suspect a simple) problem!

+4
source share
3 answers

Using data.table for syntax elegance, you can use rle to accomplish this

 library(data.table) DT <- data.table(data) DT[ ,B1 := { bb <- rle(B==1) r <- bb$values r[r] <- seq_len(sum(r)) bb$values <- r inverse.rle(bb) } ] DT[B1 == 1, sum(a)] # [1] 2 
+1
source

Here is a pretty tricky way to do this:

 data$counter = cumsum(data$B == 1) sum(data$A[(data$counter >= 1:nrow(data) - sum(data$counter == 0)) & (data$counter != 0)]) 
+1
source

Another way:

 idx <- which(data$B == 1) sum(data$A[idx[idx == (seq_along(idx) + idx[1] - 1)]]) # [1] 2 # or alternatively sum(data$A[idx[idx == seq(idx[1], length.out = length(idx))]]) # [1] 2 

Idea: first get all the indices 1. Here it is c(2,3,5) . From the very beginning, index = "2" you want to get all indexes that are continuous (or sequentially, that is, c(2,3,4,5...) ). So, from 2 take this many consecutive numbers and equate them. They will not be equal at the moment when they are not continuous. That is, when there is a mismatch, all the other following numbers will also have a mismatch. Thus, the first few numbers for which a match is equal will only be those that are “consecutive” (this is what you want).

+1
source

All Articles