Using dplyr n_distinct in a function with a quoted variable

Question

Using dplyr n_distinct in a function with a quoted variable

I am trying to use dplyr inside a function, passing the column name as a variable, then to use with n_distinct in the sum function.

I understand that programming with dplyr has become easier, with the functions summaryize, organiz_, etc., as described in vignette (nse). I have tried various combinations of interpretations from lazyeval. n_distinct responses with "Input to n_distinct () should be the only variable name from the dataset" (which makes sense, I just have the variable name in the string ...)

This works fine outside the function (the mention is the column name in data.frame):

summarize(data, count=n_distinct(mention))

This was my first effort:

getProportions <- function(datain, id_column) {
    overall_total <- summarize(datain, count=n_distinct(id_column))[1,1]
}

getProportions(measures, "mention")

And after reading the NSE documentation and some threads here about programming with dplyr, I tried:

overall_total <- summarize_(datain, count=interp(~n_distinct(var),var=as.name(id_column)))[1,1]

. ? n_distinct_()?

Edit . , , , . , var part right, sumize(), summaryize(), var = part inter-call. . , .

+4

r dplyr

jameshowison 14 . '15 17:45

1

jameshowison · Answer 1 · 2015-01-15T19:59:33+0000

, , , -, ( var = .):

f <- function(data, col) {
        summarise_(data, count = interp(~n_distinct(var), var = as.name(col)))
}
f(mtcars, "cyl")

Using dplyr n_distinct in a function with a quoted variable

More articles: