Stata has a very nice egen command, which makes it easy to compute statistics for an observation group. For example, you can calculate the maximum, average, and min for each group and add them as a variable to the detailed data set. The Stata command is one line of code:
by group : egen max = max(x)
I never found the same command in R. summarise in the dplyr package, which makes it easy to calculate statistics for each group, but then I need to run a loop to associate statistics with each observation:
library("dplyr") N <- 1000 tf <- data.frame(group = sample(1:100, size = N, replace = TRUE), x = rnorm(N)) table(tf$group) mtf <- summarise(group_by(tbl_df(tf), group), max = max(x)) tf$max <- NA for (i in 1:nrow(mtf)) { tf$max[tf$group == mtf$group[i]] <- mtf$max[i] }
Does anyone have a better solution?
r stata
Pac
source share