Is it possible to reuse created columns in ddply?

I have a script where I use ddply, as in the following example:

ddply(df, .(col), function(x) data.frame( col1=some_function(x$y), col2=some_other_function(x$y) ) ) 

Inside ddply is it possible to reuse col1 without re-invoking the whole function?

For instance:

 ddply(df, .(col), function(x) data.frame( col1=some_function(x$y), col2=some_other_function(x$y) col3=col1*col2 ) ) 
+4
source share
3 answers

You have a whole function to play with! It is not necessary to be single-line! This should work:

 ddply(df, .(col), function(x) { tmp <- some_other_function(x$y) data.frame( col1=some_function(x$y), col2=tmp, col3=tmp ) }) 
+5
source

This seems to be a good candidate for data.table using the rules for defining region j . See FAQ 2.8 for details.

From FAQ

No anonymous function is passed j. Instead, the anonymous body is passed to j.

So for your case

 library(data.table) DT <- as.data.table(df) DT[,{ col1=some_function(y) col2=some_other_function(y) col3= col1 *col2 list(col1 = col1, col2 = col2, col3 = col3) }, by = col] 

or in a slightly more direct way:

 DT[,list( col1=col1<-some_function(y) col2=col2<-some_other_function(y) col3=col1*col2 ), by = col] 

This avoids one repetition of each of col1 and col2 and avoids two repetitions of col3 ; repetition is what we aim to reduce in data.table . = followed by <- may initially appear cumbersome. This allows you to use the following syntactic sugar:

 DT[,list( "Projected return (%)"= col1<-some_function(y), "Investment ($m)"= col2<-some_other_function(y), "Return on Investment ($m)"= col1*col2 ), by = col] 

where the output can be sent directly to latex or html, for example.

+3
source

I do not think this is possible, but it should not be of much importance, because at this moment it is not an aggregation function. For instance:

 #use summarize() in ddply() data.means <- ddply(data, .(groups), summarize, mean = mean(x), sd = sd(x), n = length(x)) data.means$se <- data.means$sd / sqrt(data.means$n) data.means$Upper <- data.means$mean + (data.means$SE * 1.96) data.means$Lower <- data.means$mean - (data.means$SE * 1.96) 

So, I did not calculate SEs directly, but it was not so bad to calculate it outside of ddply() . If you really want to, you can also do

 ddply(data, .(groups), summarize, se = sd(x) / sqrt(length(x))) 

Or put it in terms of your example

 ddply(df, .(col), summarize, col1=some_function(y), col2=some_other_function(y) col3=some_function(y)*some_other_function(y) ) 
+2
source

All Articles