R: efficiently extract rows with different elements in the specified column by groups in data.table

Question

R: efficiently extract rows with different elements in the specified column by groups in data.table

I want to extract the specified lines according to colCwhich should include different groud elements by = list(colA, colB). Here is my code:

dt <- data.table(colA = c(1, 1, 1, 2, 2, 3, 3), 
                 colB = c(10, 10, 10, 20, 20, 30, 30), 
                 colC = c("A", "I", "A", "A", "A", "I", "A"))
dt
sg <- dt[, length(unique(colC)) != 1, by = list(colA, colB)]
sg

sg <- sg[sg[, V1]]
sg


> dt
    colA colB colC
1:    1   10    A
2:    1   10    I
3:    1   10    A
4:    2   20    A
5:    2   20    A
6:    3   30    I
7:    3   30    A

> sg
    colA colB    V1
1:    1   10  TRUE
2:    2   20 FALSE
3:    3   30  TRUE

> sg
   colA colB   V1
1:    1   10 TRUE
2:    3   30 TRUE

Here, the final sgone is what I want, but when the number of samples is large, length(unique(colC)) != 1slower.

Can you help me decide how to speed up my work or improve the method to finish what I want?

Thank.

+4

r data.table

Biochemoinformatics Feb 18 '15 at 15:08

source share

1 answer

Biochemoinformatics · Accepted Answer · 2015-02-18T18:54:32+0000

Here @Arun gave the best answer. It's perfect! Thank.

sg <- unique(dt)[, .N != 1L, by=.(colA, colB)][(V1)]

R: efficiently extract rows with different elements in the specified column by groups in data.table

More articles: