Matching data from frames with unequal data length in r

Question

Matching data from frames with unequal data length in r

It seems to be very simple. Ive 2 with unequal lengths in R. One is just a random subset of a larger dataset. Therefore, they have the same exact data, and UniqueID - exactly the same. What I would like to do is to indicate an indicator indicating 0 or 1 in the larger dataset, which says that this row is in the smaller dataset.

I can use which(long$UniqID %in% short$UniqID) , but I can’t figure out how to match this indicator with a long data set

+4

r match

Kerry Apr 23 '13 at 8:51

source share

5 answers

I will use @AnandaMahto data to illustrate another way using duplicated , which also works if you have unique identifier or not.

Case 1: has a unique id column

 set.seed(1) df1 <- data.frame(ID = 1:10, A = rnorm(10), B = rnorm(10)) df2 <- df1[sample(10, 4), ] transform(df1, indicator = 1 * duplicated(rbind(df2, df1)[, "ID", drop=FALSE])[-seq_len(nrow(df2))])

Case 2: Does not have a unique id column

 set.seed(1) df1 <- data.frame(A = rnorm(10), B = rnorm(10)) df2 <- df1[sample(10, 4), ] transform(df1, indicator = 1 * duplicated(rbind(df2, df1))[-seq_len(nrow(df2))])

+7

Arun Apr 23 '13 at 9:34

source share

The answers are still good. However, the question was asked: "What if there were no" UniqID "column?

At this point, merge can help:

Here is an example of using merge and %in% where the identifier is available:

 set.seed(1) df1 <- data.frame(ID = 1:10, A = rnorm(10), B = rnorm(10)) df2 <- df1[sample(10, 4), ] temp <- merge(df1, df2, by = "ID")$ID df1$matches <- as.integer(df1$ID %in% temp)

And a similar example when the identifier is not available.

 set.seed(1) df1_NoID <- data.frame(A = rnorm(10), B = rnorm(10)) df2_NoID <- df1_NoID[sample(10, 4), ] temp <- merge(df1_NoID, df2_NoID, by = "row.names")$Row.names df1_NoID$matches <- as.integer(rownames(df1_NoID) %in% temp)

+6

A5C1D2H2I1M1N2O1R2T1 Apr 23 '13 at 9:14

source share

You can directly use a logical vector as a new column:

 long$Indicator <- 1*(long$UniqID %in% short$UniqID)

+4

Nishanth Apr 23 '13 at 8:56

source share

See if this can start:

 long <- data.frame(UniqID=sample(1:100)) #creating a long data frame short <- data.frame(UniqID=long[sample(1:100, 30), ]) #creating a short one with the same ids. long$indicator <- long$UniqID %in% short$UniqID #creating an indicator column in long. > head(long) UniqID indicator 1 87 TRUE 2 15 TRUE 3 100 TRUE 4 40 FALSE 5 89 FALSE 6 21 FALSE

0

zelite Apr 23 '13 at 9:03

source share

Didzis elferts · Accepted Answer · 2013-04-23T08:56:40+0000

Made the same sample data.

 long<-data.frame(UniqID=sample(letters[1:20],20)) short<-data.frame(UniqID=sample(letters[1:20],10))

You can use %in% without which() to get the values TRUE and FALSE, and then as.numeric() convert them to 0 and 1.

 long$sh<-as.numeric(long$UniqID %in% short$UniqID)

Matching data from frames with unequal data length in r

Case 1: has a unique id column

Case 2: Does not have a unique id column

More articles: