Group a data table using a column that is a list

Question

Group a data table using a column that is a list

I really have a big problem and loop through data.table to do what I want too slow, so I'm trying to get around the loop. Suppose I have data.table as follows:

a <- data.table(i = c(1,2,3), j = c(2,2,6), k = list(c("a","b"),c("a","c"),c("b"))) > a ijk 1: 1 2 a,b 2: 2 2 a,c 3: 3 6 b

And I want to group based on values in k. So something like this:

 a[, sum(j), by = k]

right now i am getting the following error:

  Error in `[.data.table`(a, , sum(i), by = k) : The items in the 'by' or 'keyby' list are length (2,2,1). Each must be same length as rows in x or number of rows returned by i (3).

The answer I'm looking for is to first group all the rows having “a” in column k and calculate the sum (j), and then all the rows having “b”, etc. So the desired answer is:

 k V1 a 4 b 8 c 2

Any hint on how to do this efficiently? I cannot melt column K by repeating rows, since the size of data.table will be too large for my case.

+6

r data.table

newbie Jul 31 '16 at 15:16

source share

3 answers

If we use tidyr , the compact option would be

 library(tidyr) unnest(a, k)[, sum(j) ,k] # k V1 #1: a 4 #2: b 8 #3: c 2

Or using dplyr/tidyr

 unnest(a, k) %>% group_by(k) %>% summarise(V1 = sum(j)) # k V1 # <chr> <dbl> #1 a 4 #2 b 8 #3 c 2

+4

akrun Jul 31 '16 at 16:20

source share

Since group operations can be slow, I would think ...

 dat = a[rep(1:.N, lengths(k)), c(.SD, .(k = unlist(a$k))), .SDcols=setdiff(names(a), "k")] ijk 1: 1 2 a 2: 1 2 b 3: 2 2 a 4: 2 2 c 5: 3 6 b

We repeat the cols i:j lines to match the unregistered k . Data should be stored in this format, and not use list columns. From there, as in @MikeyMike's answer, we can dat[, sum(j), by=k] .

In data.table 1.9.7+ we can do it the same way

 dat = a[, c(.SD[rep(.I, lengths(k))], .(k = unlist(k))), .SDcols=i:j]

+2

Frank Aug 1 '16 at 14:13

source share

Mike H. · Accepted Answer · 2016-07-31T15:36:24+0000

I think this might work:

 a[, .(k = unlist(k)), by=.(i,j)][,sum(j),by=k] k V1 1: a 4 2: b 8 3: c 2

Group a data table using a column that is a list

More articles: