Is there a way to not return the `by` column as the first column in the data.table group

Question

Is there a way to not return the `by` column as the first column in the data.table group

If I group using the by keyword in data.table, it always returns the by column as the first column. Is there a flag / option to tell this not to be done? Or a smart way to get rid of it?

In particular, I want to group and then rbindlist into my original table, so in fact the problem can also be called as "how to stop its reordering columns"

For example:

 DT = data.table(I = as.numeric(1:6), N = rnorm(6), L = rep(c("a", "b", "c"), 2)) DT[, list(I = mean(I), N = mean(N)), by= L] DT

gives:

 > DT[, list(I = mean(I), N = mean(N)), by= L] LIN 1: a 2.5 0.4291802 2: b 3.5 0.6669517 3: c 4.5 -0.6471886 > DT INL 1: 1 1.8460998 a 2: 2 0.7093438 b 3: 3 -1.7991193 c 4: 4 -0.9877394 a 5: 5 0.6245596 b 6: 6 0.5047421 c

Regarding the rbindlist request, it would be nice to be able to do this:

 DT = rbindlist(list(DT, DT[, list(I = mean(I), N = mean(N)), by= L]))

or maybe

 DT = rbindlist(list(DT, DT[, list(I = mean(I), N = mean(N), L), by= L]))

or something similar (none of them work)

+7

r data.table

Corone Feb 23 '13 at 10:57

source share

1 answer

Arun · Accepted Answer · 2013-02-23T11:00:27+0000

I don't really like this automatic column reordering. Usually the “trick” is to use setcolorder after getting the result as follows:

 DT <- data.table(I = 1:6, N = rnorm(6), L = rep(c("a", "b", "c"), 2)) DT.out <- DT[, list(I = mean(I), N = mean(N)), by= L]

Here setcolorder as:

 setcolorder(DT.out, names(DT)) # INL # 1: 2.5 0.772719306 a # 2: 3.5 -0.008921738 b # 3: 4.5 -0.770807996 c

Of course, this works if the DT names match DT.out . In addition, you will need to specify the order of the columns explicitly:

 setcolorder(DT.out, c("I", "N", "L"))

Edit:. Since you want to link them line by line at once, yes, it would be nice if this were not an intermediate result. And since rbindlist seems to bind by position, you can use rbind , which binds by column names, and data.table reports this as a warning and suggests using use.names=F if you want to bind by position instead. You can safely ignore this warning.

 dt1 <- data.table(x=1:5, y=6:10) dt2 <- data.table(y=1:5, x=6:10) rbind(dt1, dt2) # or do.call(rbind, list(dt1, dt2)) # xy # 1: 1 6 # 2: 2 7 # 3: 3 8 # 4: 4 9 # 5: 5 10 # 6: 6 1 # 7: 7 2 # 8: 8 3 # 9: 9 4 # 10: 10 5 # Warning message: # In .rbind.data.table(...) : # Argument 2 has names in a different order. Columns will be bound by name for # consistency with base. Alternatively, you can drop names (by using an unnamed # list) and the columns will then be joined by position. Or, set use.names=FALSE.

Is there a way to not return the `by` column as the first column in the data.table group

More articles: