I would like to combine a data.frame
with an identifier variable called ensg
. The data frame looks like this:
chromosome probeset ensg symbol XXA_00 XXA_36 XXB_00 1 X 4938842 ENSMUSG00000000003 Pbsn 4.796123 4.737717 5.326664
I want to calculate the average value for each numeric column over rows with the same ensg
value. The problem here is that I would like to leave the remaining chronological variables and the symbol intact, since they are the same for the same ensg
.
In the end, I would like to have data.frame
with the identification columns chromosome
, ensg
, symbol
and the average number of numeric columns per row with the same identifier. I implemented this in ddply
, but it is very slow compared to aggregate
:
spec.mean <- function(eset.piece) { cbind(eset.piece[1,-numeric.columns],t(colMeans(eset.piece[,numeric.columns]))) } t mean.eset <- ddply(eset.consensus.grand,.(ensg),spec.mean,.progress="tk")
My first aggregate implementation looks like this:
mean.eset=aggregate(eset[,numeric.columns], by=list(eset$ensg), FUN=mean, na.rm=TRUE);
and much faster. But the problem with aggregate
is that I have to re-bind the describing variables. I did not understand how to use my custom function with aggregate
, since aggregate
does not transmit data frames, but only vectors.
Is there an elegant way to do this with aggregate
? Or is there a faster way to do this with ddply
?
Johannes
source share