Creating a function with an argument passed to dplyr :: filter, which works best for nse?

Non-standard assessment is really convenient when using dplyr verbs. But this can be problematic when using these verbs with function arguments. For example, let's say that I want to create a function that gives me the number of rows for a given view.

# Load packages and prepare data library(dplyr) library(lazyeval) # I prefer lowercase column names names(iris) <- tolower(names(iris)) # Number of rows for all species nrow(iris) # [1] 150 

Example does not work

This function does not work as expected, because species interpreted in the context of the diaphragm data frame instead of being interpreted in the context of the function argument:

 nrowspecies0 <- function(dtf, species){ dtf %>% filter(species == species) %>% nrow() } nrowspecies0(iris, species = "versicolor") # [1] 150 

3 implementation examples

To get around the non-standard assessment, I usually add an argument with an underline:

 nrowspecies1 <- function(dtf, species_){ dtf %>% filter(species == species_) %>% nrow() } nrowspecies1(iris, species_ = "versicolor") # [1] 50 # Because of function name completion the argument # species works too nrowspecies1(iris, species = "versicolor") # [1] 50 

This is not entirely satisfactory since it changes the name of the function argument to something less user friendly. Or he relies on autocomplete which, I'm afraid, is not good practice for programming. To keep a good argument name, I could do:

 nrowspecies2 <- function(dtf, species){ species_ <- species dtf %>% filter(species == species_) %>% nrow() } nrowspecies2(iris, species = "versicolor") # [1] 50 

Another way to work with custom assessment based on this answer . interp() interprets species in the context of a functional environment:

 nrowspecies3 <- function(dtf, species){ dtf %>% filter_(interp(~species == with_species, with_species = species)) %>% nrow() } nrowspecies3(iris, species = "versicolor") # [1] 50 

Given function 3 above, which one is the most reliable way to implement this filter function? Are there any other ways?

+8
r dplyr nse
source share
3 answers

The answer from @eddi is correct about what is going on here. I am writing another answer that addresses a larger query on how to write functions using the dplyr verbs. You will notice that he ultimately uses something like nrowspecies2 to avoid the tautology of species == species .

To write a wrapping dplyr verb (s) function that will work with NSE, write two functions:

First write a version that requires the cited inputs using lazyeval and the SE version of the dplyr verb. So in this case filter_ .

 nrowspecies_robust_ <- function(data, species){ species_ <- lazyeval::as.lazy(species) condition <- ~ species == species_ # * tmp <- dplyr::filter_(data, condition) # ** nrow(tmp) } nrowspecies_robust_(iris, ~versicolor) 

The second is to create a version using NSE:

 nrowspecies_robust <- function(data, species) { species <- lazyeval::lazy(species) nrowspecies_robust_(data, species) } nrowspecies_robust(iris, versicolor) 

* = if you want to do something more complex, you may need to use lazyeval::interp here, as in the tips below

** = also, if you need to change the output names, see the .dots argument

  • For the above, I followed some tips from Hadley

  • Another good resource is the Nply dplyr vignette , which illustrates .dots , interp and other functions from the lazyeval package

  • For more information on lazyeval see vignette

  • For a detailed discussion of the basic R tools for working with NSE (many of which lazyeval help you avoid), see the chapter on NSE in extended R

+5
source share

This question has absolutely nothing to do with non-standard assessment. Let me rewrite your initial function to make this clear:

 nrowspecies4 <- function(dtf, boo){ dtf %>% filter(boo == boo) %>% nrow() } nrowspecies4(iris, boo = "versicolor") #150 

The expression inside your filter always evaluated as TRUE (almost always - see the example below), so it doesn't work, and not because of some NSE magic.

Your nrowspecies2 is the way to go.

Fwiw, species in your nrowspecies0 really evaluated as a column, not as an input variable to species , and you can check this by comparing nrowspecies0(iris, NA) with nrowspecies4(iris, NA) .

+3
source share

in his 2016 UseR talk (@ 38min30s), Hadley Wickham explains the concept of referential transparency . Using the formula, the filter function can be reformulated as follows:

 nrowspecies5 <- function(dtf, formula){ dtf %>% filter_(formula) %>% nrow() } 

This has the added benefit of being more general.

 nrowspecies5(iris, ~ species == "versicolor") # 50 nrowspecies5(iris, ~ sepal.length > 6 & species == "virginica") # 41 nrowspecies5(iris, ~ sepal.length > 6 & species == "setosa") # 0 
0
source share

All Articles