Consider the following data set:
dt <- structure(list(lllocatie = structure(c(1L, 6L, 2L, 4L, 3L), .Label = c("Assen", "Oosterwijtwerd", "Startenhuizen", "t-Zandt", "Tjuchem", "Winneweer"), class = "factor"), lat = c(52.992, 53.32, 53.336, 53.363, 53.368), lon = c(6.548, 6.74, 6.808, 6.765, 6.675), mag.cat = c(3L, 2L, 1L, 2L, 2L), places = structure(c(2L, 4L, 5L, 6L, 3L), .Label = c("", "Amen,Assen,Deurze,Ekehaar,Eleveld,Geelbroek,Taarlo,Ubbena", "Eppenhuizen,Garsthuizen,Huizinge,Kantens,Middelstum,Oldenzijl,Rottum,Startenhuizen,Toornwerd,Westeremden,Zandeweer", "Loppersum,Winneweer", "Oosterwijtwerd", "t-Zandt,Zeerijp"), class = "factor")), .Names = c("lllocatie", "lat", "lon", "mag.cat", "places"), class = c("data.table", "data.frame"), row.names = c(NA, -5L))
When I want to split the rows in the last column into separate rows, I use (with data.table version 1.9.5+):
dt.new <- dt[, lapply(.SD, function(x) unlist(tstrsplit(x, ",", fixed=TRUE))), by=list(lllocatie,lat,lon,mag.cat)]
However, when I use:
dt.new2 <- dt[, lapply(.SD, function(x) unlist(tstrsplit(x, ",", fixed=TRUE))), by=lllocatie]
I get the same result, except that all columns are forced into character variables. The problem is that for small datasets it is not a big problem to specify variables that should not be split by by argument, but for datasets with many columns / variables. I know it is possible to do this with the splitstackshape package (as @ColonelBeauvel mentioned in his answer ), but I am looking for data.table as I want to associate more operations with this.
How can I prevent this by not manually specifying variables that should not be broken by ?