R, conditionally delete duplicate rows

I have a data frame in R containing the columns ID.A, ID.B and DISTANCE, where the distance is the distance between ID.A and ID.B. For each value (1-> n) ID.A there can be several ID.B and DISTANCE values ​​(i.e. there can be several repeating lines in the ID. For example, the whole value is 4, each of which has a different ID.B and distance in this line).

I would like to be able to delete lines where ID.A is duplicated, but depends on the distance value, so I stay with the smallest distance values ​​for each ID.A record.

Hope this makes sense?

Thank you very much in advance

EDIT

I hope the example is more useful than my text. Here I would like to delete the second and third lines, where ID.A = 3:

myDF <- read.table(text="ID.A ID.B DISTANCE 1 3 1 2 6 8 3 2 0.4 3 3 1 3 8 5 4 8 7 5 2 11", header = TRUE) 
+8
r conditional duplicates
source share
3 answers

You can also easily do this in the R database. If dat is your dataframe,

 do.call(rbind, by(dat, INDICES=list(dat$ID.A), FUN=function(x) head(x[order(x$DISTANCE), ], 1))) 
+7
source share

One possibility:

 myDF <- myDF[order(myDF$ID.A, myDF$DISTANCE), ] newdata <- myDF[which(!duplicated(myDF$ID.A)),] 

What gives:

  ID.A ID.B DISTANCE 1 1 3 1.0 2 2 6 8.0 5 3 2 0.4 6 4 8 7.0 7 5 2 11.0 
+6
source share

You can use the plyr package for plyr . For example, if your data is:

 d <- data.frame(id.a=c(1,1,1,2,2,3,3,3,3), id.b=c(1,2,3,1,2,1,2,3,4), dist=c(12,10,15,20,18,16,17,25,9)) id.a id.b dist 1 1 1 12 2 1 2 10 3 1 3 15 4 2 1 20 5 2 2 18 6 3 1 16 7 3 2 17 8 3 3 25 9 3 4 9 

You can use the ddply function as follows:

 library(plyr) ddply(d, "id.a", function(df) return(df[df$dist==min(df$dist),])) 

What gives:

  id.a id.b dist 1 1 2 10 2 2 2 18 3 3 4 9 
+5
source share

All Articles