Creating a function with an argument passed to dplyr :: filter, which works best for nse?

Question

Creating a function with an argument passed to dplyr :: filter, which works best for nse?

Non-standard assessment is really convenient when using dplyr verbs. But this can be problematic when using these verbs with function arguments. For example, let's say that I want to create a function that gives me the number of rows for a given view.

# Load packages and prepare data library(dplyr) library(lazyeval) # I prefer lowercase column names names(iris) <- tolower(names(iris)) # Number of rows for all species nrow(iris) # [1] 150

Example does not work

This function does not work as expected, because species interpreted in the context of the diaphragm data frame instead of being interpreted in the context of the function argument:

 nrowspecies0 <- function(dtf, species){ dtf %>% filter(species == species) %>% nrow() } nrowspecies0(iris, species = "versicolor") # [1] 150

3 implementation examples

To get around the non-standard assessment, I usually add an argument with an underline:

 nrowspecies1 <- function(dtf, species_){ dtf %>% filter(species == species_) %>% nrow() } nrowspecies1(iris, species_ = "versicolor") # [1] 50 # Because of function name completion the argument # species works too nrowspecies1(iris, species = "versicolor") # [1] 50

This is not entirely satisfactory since it changes the name of the function argument to something less user friendly. Or he relies on autocomplete which, I'm afraid, is not good practice for programming. To keep a good argument name, I could do:

 nrowspecies2 <- function(dtf, species){ species_ <- species dtf %>% filter(species == species_) %>% nrow() } nrowspecies2(iris, species = "versicolor") # [1] 50

Another way to work with custom assessment based on this answer . interp() interprets species in the context of a functional environment:

 nrowspecies3 <- function(dtf, species){ dtf %>% filter_(interp(~species == with_species, with_species = species)) %>% nrow() } nrowspecies3(iris, species = "versicolor") # [1] 50

Given function 3 above, which one is the most reliable way to implement this filter function? Are there any other ways?

+8

r dplyr nse

Paul rougieux Apr 15 '16 at 12:42

source share

3 answers

jaimedash · Answer 1 · 2016-04-15T18:30:41+0000

The answer from @eddi is correct about what is going on here. I am writing another answer that addresses a larger query on how to write functions using the dplyr verbs. You will notice that he ultimately uses something like nrowspecies2 to avoid the tautology of species == species .

To write a wrapping dplyr verb (s) function that will work with NSE, write two functions:

First write a version that requires the cited inputs using lazyeval and the SE version of the dplyr verb. So in this case filter_ .

 nrowspecies_robust_ <- function(data, species){ species_ <- lazyeval::as.lazy(species) condition <- ~ species == species_ # * tmp <- dplyr::filter_(data, condition) # ** nrow(tmp) } nrowspecies_robust_(iris, ~versicolor)

The second is to create a version using NSE:

 nrowspecies_robust <- function(data, species) { species <- lazyeval::lazy(species) nrowspecies_robust_(data, species) } nrowspecies_robust(iris, versicolor)

* = if you want to do something more complex, you may need to use lazyeval::interp here, as in the tips below

** = also, if you need to change the output names, see the .dots argument

For the above, I followed some tips from Hadley
Another good resource is the Nply dplyr vignette , which illustrates .dots , interp and other functions from the lazyeval package
For more information on lazyeval see vignette
For a detailed discussion of the basic R tools for working with NSE (many of which lazyeval help you avoid), see the chapter on NSE in extended R

eddi · Answer 2 · 2016-04-15T15:26:43+0000

This question has absolutely nothing to do with non-standard assessment. Let me rewrite your initial function to make this clear:

 nrowspecies4 <- function(dtf, boo){ dtf %>% filter(boo == boo) %>% nrow() } nrowspecies4(iris, boo = "versicolor") #150

The expression inside your filter always evaluated as TRUE (almost always - see the example below), so it doesn't work, and not because of some NSE magic.

Your nrowspecies2 is the way to go.

Fwiw, species in your nrowspecies0 really evaluated as a column, not as an input variable to species , and you can check this by comparing nrowspecies0(iris, NA) with nrowspecies4(iris, NA) .

Paul rougieux · Answer 3 · 2016-08-11T13:56:41+0000

in his 2016 UseR talk (@ 38min30s), Hadley Wickham explains the concept of referential transparency . Using the formula, the filter function can be reformulated as follows:

 nrowspecies5 <- function(dtf, formula){ dtf %>% filter_(formula) %>% nrow() }

This has the added benefit of being more general.

 nrowspecies5(iris, ~ species == "versicolor") # 50 nrowspecies5(iris, ~ sepal.length > 6 & species == "virginica") # 41 nrowspecies5(iris, ~ sepal.length > 6 & species == "setosa") # 0

Creating a function with an argument passed to dplyr :: filter, which works best for nse?

Example does not work

3 implementation examples

More articles: