Data.table: Using c = False and a transform / summary function?

Question

Data.table: Using c = False and a transform / summary function?

I want to summarize several variables in data.table, the output is in a wide format, the output is possible as a list for each variable. Since several other approaches did not work, I tried to make external latency by specifying variable names as character vectors. I wanted to pass them using = FALSE.

carsx=as.data.table(cars) lapply( list(speed="speed",dist= "dist"), #error object 'ansvals' not found function(x) carsx[,list(mean(x), min(x), max(x) ), with=FALSE ] )

Since this does not work, I tried a simpler approach without noodles.

 carsx[,list(mean("speed"), min("speed"), max("speed") ), with=FALSE ] #error object 'ansvals' not found

This does not work either. Is there a way to do something like this? Is this s behavior necessary? (I know that ?data.table mentions only for selecting columns, but in my case it would be useful to be able to convert them)

When c = FALSE, j is a vector of names or positions to select, similar to data.frame. with = FALSE is often useful in data.table for dynamically selecting columns.

EDIT My goal is to get a summary for each group in a wide format for different variables. I tried to expand the following, which only works for one variable, for a list of variables.

 carsx[,list(mean(speed), min(speed), max(speed) ) ,by=(dist>50)

Crying SO doesn't let me post my other question. There I described that I needed an output similar to:

 lapply( list(speed="speed",dist= "dist"), function(x) do.call("as.data.frame", aggregate(cars[,x], list(class=cars$dist>50), FUN=summary) ) )

The expected result will be something like this:

 $speed V1 V2 V3 1: FALSE 12.96970 4 20 2: TRUE 20.11765 14 25 $dist V1 V2 V3 1: FALSE 12.96970 4 20 2: TRUE 20.11765 14 25

+5

r data.table summarization lapply

Julian Nov 10 '14 at 12:50

source share

2 answers

You can specify columns with the .SDcols parameter:

 carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), .SDcols = c("speed", "dist")] # speed dist # 1: 15.4 42.98 # 2: 4.0 2.00 # 3: 25.0 120.00 carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), .SDcols = "speed"] # speed # 1: 15.4 # 2: 4.0 # 3: 25.0

+3

Sven hohenstein Nov 10 '14 at 13:06

source share

Julian · Accepted Answer · 2014-11-11T08:25:18+0000

Based on Svens, the answer to the combination of .SDcols, rbindlist, and external and internal lapply did the trick. Internal latent access is required for .SD access.

 lapply( list(speed="speed",dist= "dist"), function(x) carsx[ , rbindlist(lapply(.SD, function(x) list(mean=mean(x), min=min(x), max=max(x)) )), .SDcols = x,by= (dist>50)] )

Result:

 $speed dist mean min max 1: FALSE 12.96970 4 20 2: TRUE 20.11765 14 25 $dist dist mean min max 1: FALSE 27.84848 2 50 2: TRUE 72.35294 52 120

Data.table: Using c = False and a transform / summary function?

More articles: