A match in a group with a delay in data.table

Question

A match in a group with a delay in data.table

I am trying to create a new column that indicates if an identifier was present in the previous group. Here is my details:

data <- data.table(ID = c(1:3, c(9,2,3,4),c(5,1)),
                   groups = c(rep(c("a", "b", "c"), c(3, 4,2))))
   ID groups
1:  1      a
2:  2      a
3:  3      a
4:  9      b
5:  2      b
6:  3      b
7:  4      b
8:  5      c
9:  1      c

I am not sure how to indicate lagging groups. I tried to use shift, but it does not work:

data[,.(ID=ID,match_lagged=ID %in% shift(ID)),by=groups]

Here is my desired result.

The first three lines are not matched because there is no previous group. FALSE will also work for these three lines. ID = 4 (in group b) does not match in group a. ID = 5 (in group c) does not match in group b.

Note that identifier 1 in group c does not match in group b, so it must be false even if it exists in group a. That is why it duplicated(data$ID)does not work. Data from the group must be matched with the previous group .

groups ID match_lagged
1:      a  1         NA
2:      a  2         NA
3:      a  3         NA
4:      b  9         FALSE
5:      b  2         TRUE
6:      b  3         TRUE
7:      b  4         FALSE
8:      c  5         FALSE
9:      c  1         FALSE

Decision

A dplyr .

+6

matching r match data.table dplyr

Pierre Lapointe 22 . '17 21:40

2

eddi · Answer 1 · 2017-06-26T21:00:54+0000

, , diff ID.

data[, grp.id := .GRP, by = groups]
data[, match_lagged := c(FALSE, diff(grp.id) == 1), by = ID][
     grp.id == 1, match_lagged := NA][]
#   ID groups grp.id match_lagged
#1:  1      a      1           NA
#2:  2      a      1           NA
#3:  3      a      1           NA
#4:  9      b      2        FALSE
#5:  2      b      2         TRUE
#6:  3      b      2         TRUE
#7:  4      b      2        FALSE
#8:  5      c      3        FALSE
#9:  1      c      3        FALSE

, ID . , , , .

user124543131234523 · Answer 2 · 2017-06-23T01:11:38+0000

. , :

data <- data.frame(ID = c(1:3, 1:4,c(5,1)),
                   groups = c(rep(c("a", "b", "c"), c(3, 4,2))))

z <- data %>% group_by(groups) %>% summarize(all_vals = list(ID))
z <- z %>% mutate(lagged_id = lag(all_vals,1))

match_lagged <- lapply(1:nrow(z) , function(x) {
  (z$all_vals[x] %>% unlist) %in% (z$lagged_id[x] %>% unlist)
})

data$match_lagged = match_lagged %>% unlist

A match in a group with a delay in data.table

More articles: