How to get the maximum value for a group

I have data.frame with two columns: year and score . Years go from 2000-2012 and each year can be listed several times. In the rating column, I list all the points for each year, each of which has a different score.

What I would like to do is a data.frame filter, so only the rows with the maximum number of points for each year remain.

So, as a tiny example, if I have

 year score 2000 18 2001 22 2000 21 

I would like to return only

 year score 2001 22 2000 21 
+4
source share
5 answers

If you know sql, it's easier to understand

 library(sqldf) sqldf('select year, max(score) from mydata group by year') 

Update (2016-01): now you can also use dplyr

 library(dplyr) mydata %>% group_by(year) %>% summarise(max = max(score)) 
+7
source

using plyr

 require(plyr) set.seed(45) df <- data.frame(year=sample(2000:2012, 25, replace=T), score=sample(25)) ddply(df, .(year), summarise, max.score=max(score)) 

using data.table

 require(data.table) dt <- data.table(df, key="year") dt[, list(max.score=max(score)), by=year] 

using aggregate :

 o <- aggregate(df$score, list(df$year) , max) names(o) <- c("year", "max.score") 

using ave :

 df1 <- df df1$max.score <- ave(df1$score, df1$year, FUN=max) df1 <- df1[!duplicated(df1$year), ] 

Edit: In the case of more columns, a data.table solution would be better (my opinion :))

 set.seed(45) df <- data.frame(year=sample(2000:2012, 25, replace=T), score=sample(25), alpha = sample(letters[1:5], 25, replace=T), beta=rnorm(25)) # convert to data.table with key=year dt <- data.table(df, key="year") # get the subset of data that matches this criterion dt[, .SD[score %in% max(score)], by=year] # year score alpha beta # 1: 2000 20 b 0.8675148 # 2: 2001 21 e 1.5543102 # 3: 2002 22 c 0.6676305 # 4: 2003 18 a -0.9953758 # 5: 2004 23 d 2.1829996 # 6: 2005 25 b -0.9454914 # 7: 2007 17 e 0.7158021 # 8: 2008 12 e 0.6501763 # 9: 2011 24 a 0.7201334 # 10: 2012 19 d 1.2493954 
+7
source

using basic packages

 > df year score 1 2000 18 2 2001 22 3 2000 21 > aggregate(score ~ year, data=df, max) year score 1 2000 21 2 2001 22 

EDIT

If you have additional columns that you need to keep, you can merge specify those columns with aggregate

 > df <- data.frame(year = c(2000, 2001, 2000), score = c(18, 22, 21) , hrs = c( 10, 11, 12)) > df year score hrs 1 2000 18 10 2 2001 22 11 3 2000 21 12 > merge(aggregate(score ~ year, data=df, max), df, all.x=T) year score hrs 1 2000 21 12 2 2001 22 11 
+5
source
 data <- data.frame(year = c(2000, 2001, 2000), score = c(18, 22, 21)) new.year <- unique(data$year) new.score <- sapply(new.year, function(y) max(data[data$year == y, ]$score)) data <- data.frame(year = new.year, score = new.score) 
+1
source

one insert

 df_2<-data.frame(year=sort(unique(df$year)),score = tapply(df$score,df$year,max)); 
0
source

All Articles