Grouping factors in a data table.

Question

Grouping factors in a data table.

I try to combine factor levels in data.tableand wonder if there is a data.table-y way to do this.

Example:

DT = data.table(id = 1:20, ind = as.factor(sample(8, 20, replace = TRUE)))

I want to say that types 1,3,8 are in group A; 2 and 4 are in group B; and 5,6,7 are in group C.

Here is what I did, which was pretty slow in the full version of the problem:

DT[ind %in% c(1, 3, 8), grp := as.factor("A")]
DT[ind %in% c(2, 4), grp := as.factor("B")]
DT[ind %in% c(5, 6, 7), grp := as.factor("C")]

Another approach suggested by this related question could be translated as follows:

DT[ , grp := ind]
levels(DT$grp) = c("A", "B", "A", "B", "C", "C", "C", "A")

Or maybe (given that I have 65 base groups and 18 aggregated groups, this seems a bit neat)

DT[ , grp := ind]
lev <- letters(1:8)
lev[c(1, 3, 8)] <- "A"
lev[c(2, 4)] <- "B"
lev[5:7] <- "C"
levels(DT$grp) <- lev

Both of these seem bulky; Does this seem like an appropriate way to do this in data.table?

10 000 000 /. ( ), - , - . .

(Keying DT , , )

+4

r data.table

MichaelChirico 27 . '15 23:10

1

MichaelChirico · Accepted Answer · 2015-07-28T14:26:35+0000

Update:

?levels. , .. , list levels:

levels(DT$ind) = list(A = c(1, 3, 8), B = c(2, 4), C = 5:7)

:

@Arun, data.table, :

match_dt = data.table(ind = as.factor(1:12),
                      grp = as.factor(c("A", "B", "A", "B", "C", "C",
                                        "C", "A", "D", "E", "F", "D")))
setkey(DT, ind)
setkey(match_dt, ind)
DT = match_dt[DT]

( ) , ( ):

levels <- letters[1:12]
levels[c(1, 3, 8)] <- "A"
levels[c(2, 4)] <- "B"
levels[5:7] <- "C"
levels[c(9, 12)] <- "D"
levels[10] <- "E"
levels[11] <- "F"
match_dt <- data.table(ind = as.factor(1:12),
                       grp = as.factor(levels))
setkey(DT, ind)
setkey(match_dt, ind)
DT = match_dt[DT]

Grouping factors in a data table.

Update:

:

More articles: