Matching identifiers in two datasets

Question

Matching identifiers in two datasets

I have two datasets containing pre and post data. Respondents have unique identifiers, and I want to create a subset that includes only those who answered both polls. Dataset example:

pre.data <- data.frame(ID = c(1:10), Y = sample(c("yes", "no"), 10, replace = TRUE), Survey = 1) post.data <- data.frame(ID = c(1:3,6:10), Y = sample(c("yes", "no"), 8, replace = TRUE), Survey = 2) all.data <- rbind(pre.data, post.data)

I have the following function:

 match <- function(dat1, dat2, dat3){ #dat1 is whole dataset(both stitched together) #dat2 is pre dataset #dat3 is post dataset selectedRows <- (dat1$ID %in% dat2$ID & dat1$ID %in% dat3$ID) matchdata <- dat1[selectedRows,] return(matchdata) } prepost.match.data <- match(all.data, pre.data, post.data)

I think there should be a better way than this function to do the same, but I can’t think how to do it. How I did it seems a bit messy. I mean, it works - it does what I want, but I cannot help but think that there is a better way.

My apologies if this has already been asked in a similar way, but I could not find it - in this case, please indicate me the corresponding answer.

+6

matching r subset

Froom2 Apr 18 '13 at 14:05

source share

2 answers

Look at the connection in plyr.

 prepost.match.data <- join(pre.data, post.data, by = c("ID"))

+3

raynach Apr 18 '13 at 14:11

source share

juba · Accepted Answer · 2013-04-18T14:12:10+0000

Note: Arun posted the same answer in a comment a little earlier than me.

You can use intersect as follows:

 all.data[all.data$ID %in% intersect(pre.data$ID, post.data$ID),]

What gives:

  ID Y Survey 1 1 yes 1 2 2 no 1 3 3 no 1 6 6 yes 1 7 7 yes 1 8 8 yes 1 9 9 no 1 10 10 yes 1 11 1 no 2 12 2 yes 2 13 3 no 2 14 6 no 2 15 7 yes 2 16 8 yes 2 17 9 no 2 18 10 yes 2

Matching identifiers in two datasets

More articles: