Dplyr redefines everything except the first occurrences of a value within a group.

I have a grouped data_frame with a tag column accepting the values ​​"0" and "1". In each group I need to find the first occurrence of "1" and change all other occurrences to "0". Is there any way to achieve this in dplyr?

For example, let's take the β€œiris” data and add an extra β€œtag” column:

data(iris) set.seed(1) iris$tag <- sample( c(0, 1), 150, replace = TRUE, prob = c(0.8, 0.2)) giris <- iris %>% group_by(Species) 

In "giris" in the group "setosa" I need to save only the first occurrence of "1" (ie in the 4th line) and set the remaining "0". It is like applying a mask or something else ...

Is there any way to do this? I experimented with β€œone” and β€œduplicated”, but I did not succeed. I thought about filtering only β€œ1”, saving them and then joining the remaining set, but this seems inconvenient, especially for a 12 GB data set.

Thanks in advance!

+3
r dplyr which
source share
2 answers

Dplyr option:

 mutate(giris, newcol = as.integer(tag & cumsum(tag) == 1)) 

Or

 mutate(giris, newcol = as.integer(tag & !duplicated(tag))) 

Or using data.table, the same approach, but change by reference:

 library(data.table) setDT(giris) giris[, newcol := as.integer(tag & cumsum(tag) == 1), by = Species] 
+2
source share

We can try

 res <- giris %>% group_by(Species) %>% mutate(tag1 = ifelse(cumsum(c(TRUE,diff(tag)<0))!=1, 0, tag)) table(res[c("Species", "tag1")]) # tag1 #Species 0 1 # setosa 49 1 # versicolor 49 1 # virginica 49 1 
+2
source share

All Articles