Proper use of dplyr :: select in dplyr 0.7.0+, column selection using character vector

Suppose we have a cols_to_select character vector containing some columns that we want to select from the df data frame, for example.

 df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3) cols_to_select <- c("b", "d") 

Suppose also that we want to use dplyr::select because the part of the operation using %>% , so using select makes the code easy to read.

There seem to be several ways this can be achieved, but some are more reliable than others. Please could you tell me which is the β€œcorrect” version and why? Or maybe there is another, better way?

 dplyr::select(df, cols_to_select) #Fails if 'cols_to_select' happens to be the name of a column in df dplyr::select(df, !!cols_to_select) # ie using UQ() dplyr::select(df, !!!cols_to_select) # ie using UQS() cols_to_select_syms <- rlang::syms(c("b", "d")) #See [here](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures/44657171#44657171) dplyr::select(df, !!!cols_to_select_syms) 

ps I understand that this can be achieved in the R base using just df[,cols_to_select]

+7
r dplyr tidyverse rlang
source share
1 answer

There is an example with dplyr::select in https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html that uses:

 dplyr::select(df, !!cols_to_select) 

Why? Let's look at the options you mentioned:

Option 1

 dplyr::select(df, cols_to_select) 

As you say, this fails if cols_to_select is the column name in df, so this is not true.

Option 4

 cols_to_select_syms <- rlang::syms(c("b", "d")) dplyr::select(df, !!!cols_to_select_syms) 

This looks more confusing than other solutions.

Options 2 and 3

 dplyr::select(df, !!cols_to_select) dplyr::select(df, !!!cols_to_select) 

These two solutions give the same results in this case. You can see the output !!cols_to_select and !!!cols_to_select by doing:

 dput(rlang::`!!`(cols_to_select)) # c("b", "d") dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d") 

Operator !! or UQ() immediately evaluates its argument in its context, and that is what you want.

Operator !!! or UQS() used to pass multiple arguments to a function at the same time.

For character column names, as in your example, it does not matter whether you specify them as one vector of length 2 (using !! ) or as a list with two vectors of length one (using !!! ). For more complex use cases, you will need to use a few list arguments: (using !!! )

 a <- quos(contains("c"), dplyr::starts_with("b")) dplyr::select(df, !!a) # does not work dplyr::select(df, !!!a) # does work 
+4
source share

All Articles