R: aggregate data.frame columns

I have a data.frame that looks like

> head(df) Memory Memory Memory Memory Memory Naive Naive 10472501 6.075714 5.898929 6.644946 6.023901 6.332126 8.087944 7.520194 10509163 6.168941 6.495393 5.951124 6.052527 6.404401 7.152890 8.335509 10496091 10.125575 9.966211 10.075613 10.310952 10.090649 11.803949 11.274480 10427035 6.644921 6.658567 6.569745 6.499243 6.990852 8.010784 7.798154 10503695 8.379494 8.153917 8.246484 8.390747 8.346748 9.540236 9.091740 10451763 10.986717 11.233819 10.643245 10.230697 10.541396 12.248487 11.823138 

and I would like to find the average of the Memory columns and the average of the Naive columns. The aggregate function concatenates strings. This data.frame can potentially have a large number of rows, and therefore wrapping and then applying aggregate to the colnames original data.frame seems bad to me, and is usually annoying:

 > head(t(aggregate(t(df),list(colnames(df)), mean))) [,1] [,2] Group.1 "Memory" "Naive" 10472501 "6.195123" "8.125439" 10509163 "6.214477" "7.733625" 10496091 "10.11380" "11.55348" 10427035 "6.672665" "8.266854" 10503695 "8.303478" "9.340436" 

What a dazzlingly obvious thing I'm missing out on?

+7
r dataframe
source share
5 answers

I am a big believer in reformatting the data so that it is "long." The usefulness of a long format is especially evident when it comes to such problems. Fortunately, it is easy enough to remake such data in almost any format with the reshape package.

If I understood your question correctly, you need to have a Memory and Naive value for each line. For some reason, we need to make the column names unique to reshape::melt() .

 colnames(df) <- paste(colnames(df), 1:ncol(df), sep = "_") 

Then you need to create an ID column. You can either do

 df$ID <- 1:nrow(df) 

or, if these growth names are significant

 df$ID <- rownames(df) 

Now, with the reshape package

 library(reshape) df.m <- melt(df, id = "ID") df.m <- cbind(df.m, colsplit(df.m$variable, split = "_", names = c("Measure", "N"))) df.agg <- cast(df.m, ID ~ Measure, fun = mean) 

df.agg should now look like your desired output snippet.

Or, if you only need common tools across all lines, the Zack suggestion will work. Something like

 m <- colMeans(df) tapply(m, colnames(df), mean) 

You can get the same result, but formatted as a dataframe with

 cast(df.m, .~variable, fun = mean) 
+8
source share

How about something like

 l <-lapply(unique(colnames(df)), function(x) rowMeans(df[,colnames(df) == x])) df <- do.call(cbind.data.frame, l) 
+4
source share

To clarify Jonathan Chang's answer ... the blindly obvious thing you are missing is you can just select the columns and issue the rowMeans command. This will give a vector of funds for each row. His command gets the row value for each group of unique column names, and that was exactly what I was going to write. With your sample data, the result of his command is two lists.

rowMeans is also very fast.

To break it down, to get funds for all the columns of your memory, just

 rowMeans(df[,colnames(df) == 'Memory']) #or from you example, rowMeans(df[,1:5]) 

This is the easiest complete correct answer, vote for it and mark it correct if you like it.

(By the way, I also liked Joe's recommendation to save things as a whole as long data.)

+3
source share

I think you loaded your data without header=TRUE , and what you have is a factor matrix, and therefore your good idea generally does not work.

0
source share
 m = matrix(1:12,3) colnames(m) = c(1,1,2,2) m 1 1 2 2 [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 mt = t(m) sapply(by(mt,rownames(mt),colMeans),identity) 1 2 V1 2.5 8.5 V2 3.5 9.5 V3 4.5 10.5 
0
source share

All Articles