Use function list with dplyr :: summaryize_each_

I would like to apply a list of programmatically selected functions to each column of a data frame using dplyr . For illustration, here is a list of my functions:

 fun_list <- lapply(iris[-5], function(x) if(var(x) > 0.7) median else mean) 

I thought this would work:

 iris %>% group_by(Species) %>% summarise_each_(funs_(fun_list), names(iris)[-5]) 

based on ?funs_ , which claims that arguments should be among other things:

List of functions defined ... Function itself, average

But this is not an error:

 Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "function" 

funs_ actually expect a list of characters matching the functions defined in the appropriate environment, instead of the actual functions. In my application, although I only get functions, not their symbol names (in addition, functions can be anonymous).

Is there a way to pass the actual dplyr functions using dplyr ? Note. I am specifically looking for dplyr answer as I know how to solve this problem with other tools.

+8
r dplyr
source share
1 answer

If fun_list is a list of functions, you can convert it to a list of "lazy objects" before using it in dplyr functions.

 library(lazyeval) fun_list2 <- lapply(fun_list, function(f) lazy(f(.))) 

or

 fun_list2 <- lapply(fun_list, function(f) lazy_(quote(f), env = environment())) 

But I'm not sure if this is a 100% waterproof method.

Update

Based on the comments (one function uses one function):

 dispatch <- lazy_(quote((fun_list[[as.character(substitute(.))]](.))), env = environment()) iris %>% group_by(Species) %>% summarise_each_(funs_(dispatch), names(iris)[-5]) 

The idea is to use summarise_each_ , but not with a list of functions, but with a single submission. This function takes a variable, finds the correct function from the original fun_list (by its name!) And uses the variable as input.

The solution works if the names of the function list match the names of the variables.

You can also dynamically determine the control room and the list of functions (in this case, the environment is not global):

 get_dispatch <- function(fun_list) { return(lazy_(quote((fun_list[[as.character(substitute(.))]](.))), env = environment())) } dispatch <- get_dispatch(lapply(iris[-5], function(x) if(var(x) > 0.7) median else mean)) 
+3
source share

All Articles