Delete duplicates based on the state of the second column

Question

Delete duplicates based on the state of the second column

I am trying to remove duplicate rows from a data frame based on the maximum value in another column

So for the data frame:

df<-data.frame (rbind(c("a",2,3),c("a",3,4),c("a",3,5),c("b",1,3),c("b",2,6),c("r",4,5))
  colnames(df)<-c("id","val1","val2")

id val1 val2

  a    2    3

  a    3    4

  a    3    5

  b    1    3

  b    2    6

  r    4    5

I would like to remove all duplicates by id with the condition that for the corresponding rows they will not have the maximum value for val2.

Thus, the data frame should become:

  a    3    5

  b    2    6

  r    4    5

-> delete all duplicates, but keep the row with the maximum value for df $ val2 for a subset (df, df $ id == "a")

+4

r

agatha 21 sept '14 at 19:15

source share

3 answers

One possible way is to use data.table

library(data.table)
setDT(df)[, .SD[which.max(val2)], by = id]
##    id val1 val2
## 1:  a    3    5
## 2:  b    2    6
## 3:  r    4    5

+5

David Arenburg 21 . '14 19:20

,

df <- data.frame (id = c(rep("a", 3), rep("b", 2), "r"),
                  val1 = c(2, 3, 3, 1, 2, 4), val2 = c(3, 4, 5, 3, 6, 5))

split - unsplit

> unsplit(lapply(split(df, df$id), function(x) {
      if(nrow(x) > 1) {
          x[duplicated(x$id) & x$val2 == max(x$val2),]
      } else {
          x
      }
  }), levels(df$id))
#   id val1 val2
# 3  a    3    5
# 5  b    2    6
# 6  r    4    5

Reduce(rbind, ...) do.call(rbind, ...) unsplit

+2

Rich Scriven 21 . '14 19:28

akrun · Accepted Answer · 2014-09-21T19:22:11+0000

Use base R. Here are the columns factors. Remember to convert it tonumeric

 df$val2 <- as.numeric(as.character(df$val2))
 df[with(df, ave(val2, id, FUN=max)==val2),]
 #  id val1 val2
 #3  a    3    5
 #5  b    2    6
 #6  r    4    5

Or using dplyr

 library(dplyr)
 df %>% 
    group_by(id) %>% 
    filter(val2==max(val2))
 #   id val1 val2
 #1  a    3    5
 #2  b    2    6
 #3  r    4    5

Delete duplicates based on the state of the second column

More articles: