How to make column A unique in R and save the row with the maximum value in column B

I have data.frame with multiple columns (17). Column 2 has several rows with the same value, I want to save only one of these rows, in particular the one that has the maximum value in column 17.

For instance:

AB 'a' 1 'a' 2 'a' 3 'b' 5 'b' 200 Would return AB 'a' 3 'b' 200 

(plus the rest of the columns)

So far I have used a unique function, but I think that it accidentally saves one or saves only the first that appears.

** UPDATE ** Real data has 376,000 rows. I tried data.table and offer suggestions, but they take forever. Any idea which is most effective?

+4
source share
2 answers

Solution using data.table package:

 set.seed(42) dat <- data.frame(A=c('a','a','a','b','b'),B=c(1,2,3,5,200),C=rnorm(5)) library(data.table) dat <- as.data.table(dat) dat[,.SD[which.max(B)],by=A] ABC 1: a 3 0.3631284 2: b 200 0.4042683 
+6
source

Not very elegant solution using basic R functions

 > ind <- with(dat, tapply(B, A, which.max)) # Using @Roland data > mysplit <- split(dat, dat$A) > do.call(rbind, lapply(1:length(mysplit), function(i) mysplit[[i]][ind[i],])) ABC 3 a 3 0.3631284 5 b 200 0.4042683 
+3
source

All Articles