Matching data from frames with unequal data length in r

It seems to be very simple. Ive 2 with unequal lengths in R. One is just a random subset of a larger dataset. Therefore, they have the same exact data, and UniqueID - exactly the same. What I would like to do is to indicate an indicator indicating 0 or 1 in the larger dataset, which says that this row is in the smaller dataset.

I can use which(long$UniqID %in% short$UniqID) , but I can’t figure out how to match this indicator with a long data set

+4
source share
5 answers

Made the same sample data.

 long<-data.frame(UniqID=sample(letters[1:20],20)) short<-data.frame(UniqID=sample(letters[1:20],10)) 

You can use %in% without which() to get the values ​​TRUE and FALSE, and then as.numeric() convert them to 0 and 1.

 long$sh<-as.numeric(long$UniqID %in% short$UniqID) 
+7
source

I will use @AnandaMahto data to illustrate another way using duplicated , which also works if you have unique identifier or not.

Case 1: has a unique id column

 set.seed(1) df1 <- data.frame(ID = 1:10, A = rnorm(10), B = rnorm(10)) df2 <- df1[sample(10, 4), ] transform(df1, indicator = 1 * duplicated(rbind(df2, df1)[, "ID", drop=FALSE])[-seq_len(nrow(df2))]) 

Case 2: Does not have a unique id column

 set.seed(1) df1 <- data.frame(A = rnorm(10), B = rnorm(10)) df2 <- df1[sample(10, 4), ] transform(df1, indicator = 1 * duplicated(rbind(df2, df1))[-seq_len(nrow(df2))]) 
+7
source

The answers are still good. However, the question was asked: "What if there were no" UniqID "column?

At this point, merge can help:

Here is an example of using merge and %in% where the identifier is available:

 set.seed(1) df1 <- data.frame(ID = 1:10, A = rnorm(10), B = rnorm(10)) df2 <- df1[sample(10, 4), ] temp <- merge(df1, df2, by = "ID")$ID df1$matches <- as.integer(df1$ID %in% temp) 

And a similar example when the identifier is not available.

 set.seed(1) df1_NoID <- data.frame(A = rnorm(10), B = rnorm(10)) df2_NoID <- df1_NoID[sample(10, 4), ] temp <- merge(df1_NoID, df2_NoID, by = "row.names")$Row.names df1_NoID$matches <- as.integer(rownames(df1_NoID) %in% temp) 
+6
source

You can directly use a logical vector as a new column:

 long$Indicator <- 1*(long$UniqID %in% short$UniqID) 
+4
source

See if this can start:

 long <- data.frame(UniqID=sample(1:100)) #creating a long data frame short <- data.frame(UniqID=long[sample(1:100, 30), ]) #creating a short one with the same ids. long$indicator <- long$UniqID %in% short$UniqID #creating an indicator column in long. > head(long) UniqID indicator 1 87 TRUE 2 15 TRUE 3 100 TRUE 4 40 FALSE 5 89 FALSE 6 21 FALSE 
0
source

All Articles