R, conditionally delete duplicate rows

Question

R, conditionally delete duplicate rows

I have a data frame in R containing the columns ID.A, ID.B and DISTANCE, where the distance is the distance between ID.A and ID.B. For each value (1-> n) ID.A there can be several ID.B and DISTANCE values (i.e. there can be several repeating lines in the ID. For example, the whole value is 4, each of which has a different ID.B and distance in this line).

I would like to be able to delete lines where ID.A is duplicated, but depends on the distance value, so I stay with the smallest distance values for each ID.A record.

Hope this makes sense?

Thank you very much in advance

EDIT

I hope the example is more useful than my text. Here I would like to delete the second and third lines, where ID.A = 3:

myDF <- read.table(text="ID.A ID.B DISTANCE 1 3 1 2 6 8 3 2 0.4 3 3 1 3 8 5 4 8 7 5 2 11", header = TRUE)

+8

r conditional duplicates

Jsnf2012 May 31 '12 at 14:08

source share

3 answers

Matthew plourde · Answer 1 · 2012-05-31T14:26:25+0000

You can also easily do this in the R database. If dat is your dataframe,

 do.call(rbind, by(dat, INDICES=list(dat$ID.A), FUN=function(x) head(x[order(x$DISTANCE), ], 1)))

Davi moreira · Answer 2 · 2012-06-01T03:15:15+0000

One possibility:

 myDF <- myDF[order(myDF$ID.A, myDF$DISTANCE), ] newdata <- myDF[which(!duplicated(myDF$ID.A)),]

What gives:

  ID.A ID.B DISTANCE 1 1 3 1.0 2 2 6 8.0 5 3 2 0.4 6 4 8 7.0 7 5 2 11.0

juba · Answer 3 · 2012-05-31T14:21:14+0000

You can use the plyr package for plyr . For example, if your data is:

 d <- data.frame(id.a=c(1,1,1,2,2,3,3,3,3), id.b=c(1,2,3,1,2,1,2,3,4), dist=c(12,10,15,20,18,16,17,25,9)) id.a id.b dist 1 1 1 12 2 1 2 10 3 1 3 15 4 2 1 20 5 2 2 18 6 3 1 16 7 3 2 17 8 3 3 25 9 3 4 9

You can use the ddply function as follows:

 library(plyr) ddply(d, "id.a", function(df) return(df[df$dist==min(df$dist),]))

What gives:

  id.a id.b dist 1 1 2 10 2 2 2 18 3 3 4 9

R, conditionally delete duplicate rows

More articles: