Dplyr redefines everything except the first occurrences of a value within a group.

Question

Dplyr redefines everything except the first occurrences of a value within a group.

I have a grouped data_frame with a tag column accepting the values "0" and "1". In each group I need to find the first occurrence of "1" and change all other occurrences to "0". Is there any way to achieve this in dplyr?

For example, let's take the “iris” data and add an extra “tag” column:

data(iris) set.seed(1) iris$tag <- sample( c(0, 1), 150, replace = TRUE, prob = c(0.8, 0.2)) giris <- iris %>% group_by(Species)

In "giris" in the group "setosa" I need to save only the first occurrence of "1" (ie in the 4th line) and set the remaining "0". It is like applying a mask or something else ...

Is there any way to do this? I experimented with “one” and “duplicated”, but I did not succeed. I thought about filtering only “1”, saving them and then joining the remaining set, but this seems inconvenient, especially for a 12 GB data set.

Thanks in advance!

+3

r dplyr which

rpl Mar 18 '16 at 8:19

source share

2 answers

We can try

 res <- giris %>% group_by(Species) %>% mutate(tag1 = ifelse(cumsum(c(TRUE,diff(tag)<0))!=1, 0, tag)) table(res[c("Species", "tag1")]) # tag1 #Species 0 1 # setosa 49 1 # versicolor 49 1 # virginica 49 1

+2

akrun Mar 18 '16 at 8:27

source share

docendo discimus · Accepted Answer · 2016-03-18T08:29:31+0000

Dplyr option:

 mutate(giris, newcol = as.integer(tag & cumsum(tag) == 1))

Or

 mutate(giris, newcol = as.integer(tag & !duplicated(tag)))

Or using data.table, the same approach, but change by reference:

 library(data.table) setDT(giris) giris[, newcol := as.integer(tag & cumsum(tag) == 1), by = Species]

Dplyr redefines everything except the first occurrences of a value within a group.

More articles: