The R aggregate base can handle this, but in a weird way:
(temp <- aggregate(. ~ Temp + pH, x, function(y) cbind(mean(y), sd(y), length(y)))) # Temp pH Rep.1 Rep.2 Rep.3 Var1.1 Var1.2 Var1.3 Var2.1 Var2.2 Var2.3 # 1 10 7.6 2 1 3 4.281195 1.352194 3.000000 3.534447 1.652884 3.000000 # 2 20 7.6 2 1 3 5.840411 1.120549 3.000000 6.907273 8.628021 3.000000 # 3 10 8.1 2 1 3 5.583853 2.491672 3.000000 4.116622 1.478286 3.000000 # 4 20 8.1 2 1 3 6.635154 2.232262 3.000000 8.893188 4.208087 3.000000 # Var3.1 Var3.2 Var3.3 # 1 0.1529616 1.0762763 3.0000000 # 2 0.1301949 1.7642008 3.0000000 # 3 1.1611944 1.0813007 3.0000000 # 4 0.5509202 1.1874306 3.0000000 str(temp) # 'data.frame': 4 obs. of 6 variables: # $ Temp: Factor w/ 2 levels "10","20": 1 2 1 2 # $ pH : Factor w/ 2 levels "7.6","8.1": 1 1 2 2 # $ Rep : num [1:4, 1:3] 2 2 2 2 1 1 1 1 3 3 ... # $ Var1: num [1:4, 1:3] 4.28 5.84 5.58 6.64 1.35 ... # $ Var2: num [1:4, 1:3] 3.53 6.91 4.12 8.89 1.65 ... # $ Var3: num [1:4, 1:3] 0.153 0.13 1.161 0.551 1.076 ...
Note that when we look at the structure of the output, we find that "Rep", "Var1", etc. are actually matrices. That way you can extract them and cbind them. But this is a little tiring.
I needed to do something similar some time ago, and I ended up just writing a basic wrapper around aggregate that looks like this.
aggregate2 <- function(data, aggs, ids, funs = NULL, ...) { if (identical(aggs, ".")) aggs <- setdiff(names(data), ids) if (identical(ids, ".")) ids <- setdiff(names(data), aggs) if (is.null(funs)) stop("Aggregation function missing") myformula <- as.formula( paste(sprintf("cbind(%s)", paste(aggs, collapse = ", ")), " ~ ", paste(ids, collapse = " + "))) temp <- aggregate( formula = eval(myformula), data = data, FUN = function(x) sapply(seq_along(funs), function(z) eval(call(funs[z], quote(x)))), ...) temp1 <- do.call(cbind, lapply(temp[-c(1:length(ids))], as.data.frame)) names(temp1) <- paste(rep(aggs, each = length(funs)), funs, sep = ".") cbind(temp[1:length(ids)], temp1) }
Here you apply it to the sample data.
(temp2 <- aggregate2(x, ".", c("Temp", "pH"), c("mean", "sd", "length"))) # Temp pH Rep.mean Rep.sd Rep.length Var1.mean Var1.sd Var1.length Var2.mean # 1 10 7.6 2 1 3 4.281195 1.352194 3 3.534447 # 2 20 7.6 2 1 3 5.840411 1.120549 3 6.907273 # 3 10 8.1 2 1 3 5.583853 2.491672 3 4.116622 # 4 20 8.1 2 1 3 6.635154 2.232262 3 8.893188 # Var2.sd Var2.length Var3.mean Var3.sd Var3.length # 1 1.652884 3 0.1529616 1.076276 3 # 2 8.628021 3 0.1301949 1.764201 3 # 3 1.478286 3 1.1611944 1.081301 3 # 4 4.208087 3 0.5509202 1.187431 3
And, structure is what we expect.
str(temp2) # 'data.frame': 4 obs. of 14 variables: # $ Temp : Factor w/ 2 levels "10","20": 1 2 1 2 # $ pH : Factor w/ 2 levels "7.6","8.1": 1 1 2 2 # $ Rep.mean : num 2 2 2 2 # $ Rep.sd : num 1 1 1 1 # $ Rep.length : num 3 3 3 3 # $ Var1.mean : num 4.28 5.84 5.58 6.64 # $ Var1.sd : num 1.35 1.12 2.49 2.23 # $ Var1.length: num 3 3 3 3 # $ Var2.mean : num 3.53 6.91 4.12 8.89 # $ Var2.sd : num 1.65 8.63 1.48 4.21 # $ Var2.length: num 3 3 3 3 # $ Var3.mean : num 0.153 0.13 1.161 0.551 # $ Var3.sd : num 1.08 1.76 1.08 1.19 # $ Var3.length: num 3 3 3 3
If you don't want to use this function, this is the part that specifically deals with working with aggregate output, as applied to the temp object that we created at the beginning of this answer:
temp1 <- do.call(cbind, lapply(temp[-c(1:2)], as.data.frame)) funs <- c("mean", "sd", "length") names(temp1) <- paste(rep(setdiff(names(temp), c("pH", "Temp")), each = length(funs)), funs, sep = ".") cbind(temp[1:2], temp1)
Update: simpler solution
It turns out that you can really do:
do.call(data.frame, aggregate(. ~ Temp + pH, x, function(y) cbind(mean(y), sd(y), length(y))))
The downside here is that the names are less descriptive than the aggregate2 function I shared, but this can be fixed with a fairly simple call to names .