Dplyr: standard score for mutant with specified variable names

How can I use mutate (my presumption is that I am looking for a standard estimate in my case and therefore mutate_ , but I'm not quite sure about this) when using a function that takes a list of variable names, for example:

 createSum = function(data, variableNames) { data %>% mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), var = as.name(paste(as.character(variableNames), collapse =",")))) } 

Here is the MWE, which divides the function into its core logic and demonstrates what I'm trying to achieve:

 library(dplyr) library(lazyeval) # function to make random table with given column names makeTable = function(colNames, sampleSize) { liSample = lapply(colNames, function(week) { sample = rnorm(sampleSize) }) names(liSample) = as.character(colNames) return(tbl_df(data.frame(liSample, check.names = FALSE))) } # create some sample data with the column name patterns required weekDates = seq.Date(from = as.Date("2014-01-01"), to = as.Date("2014-08-01"), by = "week") dfTest = makeTable(weekDates, 10) # test mutate on this table dfTest %>% mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), var = as.name(paste(as.character(weekDates), collapse =",")))) 

The expected result here is what will be returned:

 rowSums(dfTest[, as.character(weekDates)]) 
+5
source share
2 answers

I think this is what you are after

 createSum = function(data, variableNames) { data %>% mutate_(sumvar = paste(as.character(variableNames), collapse ="+")) } createSum(dfTest, weekDates) 

where we simply specify the value of the character, not interp , because you cannot pass a list of names as a single parameter to a function. In addition, sum() will do some unwanted flushing because operations are not performed, they are passed in the columns of vectors at a time.

Another problem with this example is that you set check.names=FALSE in your data.frame file, which means that you have created column names that cannot be valid characters. You can explicitly wrap variable names in back-ticks if you like

 createSum(dfTest , paste0("`", weekDates,"`")) 

but in general, it’s better not to use invalid names.

+5
source

I don't know if this is the “officially sanctioned” dplyr method, but this is an opportunity:

 weekDates = as.character(weekDates) # more convenient dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .))) #or dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .)))) 

This leads to potentially significant performance penalties, depending on your specific use - in addition to dplyr regularly copying all the data, I think it also copies it a second time during this internal calculation. You can look into data.table to avoid extra copying by adding columns in place (and using .SDcols to avoid a second copy) + you will get the best syntax perhaps.

+1
source

All Articles