I want to extract the specified lines according to colCwhich should include different groud elements by = list(colA, colB). Here is my code:
dt <- data.table(colA = c(1, 1, 1, 2, 2, 3, 3),
colB = c(10, 10, 10, 20, 20, 30, 30),
colC = c("A", "I", "A", "A", "A", "I", "A"))
dt
sg <- dt[, length(unique(colC)) != 1, by = list(colA, colB)]
sg
sg <- sg[sg[, V1]]
sg
> dt
colA colB colC
1: 1 10 A
2: 1 10 I
3: 1 10 A
4: 2 20 A
5: 2 20 A
6: 3 30 I
7: 3 30 A
> sg
colA colB V1
1: 1 10 TRUE
2: 2 20 FALSE
3: 3 30 TRUE
> sg
colA colB V1
1: 1 10 TRUE
2: 3 30 TRUE
Here, the final sgone is what I want, but when the number of samples is large, length(unique(colC)) != 1slower.
Can you help me decide how to speed up my work or improve the method to finish what I want?
Thank.
source
share