Replace the values ​​with their average value in the data frame in R

I need to replace the values ​​of two replicas (A and B) in a data frame with their average value.

This is a data frame:

Sample.Name <- c("sample01","sample01","sample02","sample02","sample03","sample03") Rep <- c("A", "B", "A", "B", "A", "B") Rep <- as.factor(Rep) joy <- sample(1000:50000000, size=120, replace=TRUE) values <- matrix(joy, nrow=6, ncol=20) df.data <- cbind.data.frame(Sample.Name, Rep, values) names(df.data)[-c(1:2)] <- paste("V", 1:20, sep="") 

And this is the loop I was trying to write in order to replace the average with a replica:

 Sample <- as.factor(Sample.Name) livelli <- levels(Sample) for (i in (1:(length(livelli)))){ estrai.replica <- which(df.data == livelli[i]) media.replica <- apply(values[estrai.replica,], 2, mean) foo <- rbind(media.replica) } 

Main problems:

  • So I only have the last line in my new data frame (foo) and
  • I do not have a sample name in any column.

Do you have any suggestions?

+4
source share
3 answers

I think you want to aggregate your data frame. Try the following:

 aggregate(df.data, by=list(Sample.Name), FUN=mean) 
+4
source

Out of curiosity, I tried a tapply based solution.

 # Not correct: lapply(df.data[-(1:3)], tapply, INDEX=df.data$Sample.Name, FUN=mean) 

You just need to as.data.frame .

 # Not correct: as.data.frame(lapply(df.data[-(1:3)], tapply, INDEX=df.data$Sample.Name, FUN=mean)) 

EDIT: Like @daroczig I had an error complaining that the trim argument for mean.default is not length 1. Thus, adding additional form arguments meant an attempt, but only when I also changed to a version with two arguments " ["I managed to satisfy the interpreter, but still not getting the correct grouping of the application-function. This version works:

 as.data.frame(lapply(df.data[, 3:22], function(x) tapply(x, df.data$Sample.Name, FUN=mean)) ) 
+2
source

A data.table solution for time and memory efficiency

 library(data.table) DT <- as.data.table(df.data) DT[,lapply(.SD, mean),by = Sample.Name, .SDcols = paste0('V',1:20)] 

Note that .SD is a subset for each group, and .SDcols defines the columns in .SD to evaluate lapply on.

+1
source

All Articles