Column selection using function in j-environment

Consider the following column selection in data.table :

 library(data.table) # using 1.8.7 from r-forge dt <- data.table(a = 1:5, b = i <- rnorm(5), c = pnorm(i)) dt[, list(a,b)] #ok 

To streamline my code in some calculations with many and variable columns, I want to replace list(a,b) with a function. Here is the first attempt:

 .ab <- function() quote(list(a, b)) dt[, eval(.ab())] #ok - same as above 

Ideally, I would like to get rid of eval() from the call to [.data.table and restrict it to the definition of .ab , while avoiding passing the dt data table to the .ab function,

 .eab <- function() eval(quote(list(a, b))) dt[, .eab()] # Error in eval(expr, envir, enclos) : object 'b' not found 

What's happening? How can this be fixed?

I suspect that I was bitten by the R-lexical reach and the fact that the correct evaluation of list(a,b) relies on being in the J environment of the dt data table. Alas, I do not know how to get a link to the correct environment and use it as an envir or enclos in dt .

 # .eab <- function() eval(quote(list(a, b)), envir = ?, enclos = ?) 

EDIT

This approach almost works:

 .eab <- function(e) eval(quote(list(a, b)), envir = e) dt[, .eab(dt)] 

There are two drawbacks: (1) column names are not returned, (2) dt should be explicitly passed (which I would prefer to avoid). I would also prefer to avoid hardcoding dt as the environment of choice. These considerations lead to an alternative consideration to the above question: is there a software way to get the dt environment from within .eab ?

+8
r data.table
source share
2 answers

Warning, this can be non-strict, slow and / or torn if the internal machine [.data.table changes, but if for some reason there is no way around it, here is a function that seems to fit your requirements. I could also assume that it does not work if you start using other parameters, for example, by in [.data.table .

 .eab <- function() { foo <- quote(list(a,b)) ans <- eval(foo, envir = parent.frame(3)$x) names(ans) <- vapply(as.list(foo)[-1], deparse, character(1)) ans } identical(dt[, .eab()], dt[, list(a,b)]) # TRUE 

Again, this is a disruption / reduction of the large amount of code that exists for a good reason.

+2
source share

The goal is to create an expression, not a function.

 DT[, list(a,b), by=...] # ok .ab = quote(list(a, b)) # simpler here, no need for function() DT[, eval(.ab), by=...] # same 

This approach is one of the reasons why data grouping occurs quickly in data.table: j is evaluated in a static environment for all groups, so you can avoid the (small) overhead of each function call.

But if .ab really should be a function for some reason, then of course we might think.

+4
source share

All Articles