I would like to combine a data.frame with an identifier variable called ensg . The data frame looks like this:
  chromosome probeset ensg symbol XXA_00 XXA_36 XXB_00 1 X 4938842 ENSMUSG00000000003 Pbsn 4.796123 4.737717 5.326664 
I want to calculate the average value for each numeric column over rows with the same ensg value. The problem here is that I would like to leave the remaining chronological variables and the symbol intact, since they are the same for the same ensg .
In the end, I would like to have data.frame with the identification columns chromosome , ensg , symbol and the average number of numeric columns per row with the same identifier. I implemented this in ddply , but it is very slow compared to aggregate :
 spec.mean <- function(eset.piece) { cbind(eset.piece[1,-numeric.columns],t(colMeans(eset.piece[,numeric.columns]))) } t mean.eset <- ddply(eset.consensus.grand,.(ensg),spec.mean,.progress="tk") 
My first aggregate implementation looks like this:
 mean.eset=aggregate(eset[,numeric.columns], by=list(eset$ensg), FUN=mean, na.rm=TRUE); 
and much faster. But the problem with aggregate is that I have to re-bind the describing variables. I did not understand how to use my custom function with aggregate , since aggregate does not transmit data frames, but only vectors.
Is there an elegant way to do this with aggregate ? Or is there a faster way to do this with ddply ?
Johannes 
source share