The "unlist" column in data.table

in my table, some cells are vectors instead of a single value, i.e. a column is a list instead of a vector:

dt1 <- data.table( colA= c('A1','A2','A3'), colB=list('B1',c('B2a','B2b'),'B3'), colC= c('C1','C2','C3'), colD= c('D1','D2','D3') ) dt1 # colA colB colC colD #1: A1 B1 C1 D1 #2: A2 B2a,B2b C2 D2 #3: A3 B3 C3 D3 

I need to reformat it to long format without selecting this colB column. So far I am doing it like this:

 dt1[,.(colB=unlist(colB)),by=.(colA,colC,colD)] # colA colC colD colB #1: A1 C1 D1 B1 #2: A2 C2 D2 B2a #3: A2 C2 D2 B2b #4: A3 C3 D3 B3 

it does the job, but I don't like that I have to explicitly specify all other column names in by= . Is there a better way to do this?
(I'm sure he already answered elsewhere, but I could not find him so far)

PS Ideally, I would like to do without any external packages

+8
r data.table
source share
2 answers

Advancing my comment to the answer. Using:

 dt1[,.(colB = unlist(colB)), by = setdiff(names(dt1), 'colB')] 

gives:

  colA colC colD colB 1: A1 C1 D1 B1 2: A2 C2 D2 B2a 3: A2 C2 D2 B2b 4: A3 C3 D3 B3 

Or as an alternative (slight change to @Frank's suggestion):

 dt1[rep(dt1[,.I], lengths(colB))][, colB := unlist(dt1$colB)][] 
+3
source share

I think @Jaap is the easiest, but here is another alternative to chew on:

 #create ID column dt1[ , ID := .I] #unnest colB, keep ID column dt_unnest = dt1[ , .(ID = rep(ID, lengths(colB)), colB = unlist(colB))] #merge dt_unnest = dt_unnest[dt1[ , !'colB'], on = 'ID'] # ID colB colA colC colD # 1: 1 B1 A1 C1 D1 # 2: 2 B2a A2 C2 D2 # 3: 2 B2b A2 C2 D2 # 4: 3 B3 A3 C3 D3 
+5
source share

All Articles