How to match missing identifiers?

Question

How to match missing identifiers?

I have a large table with 50,000 rpm. The following mimics the structure:

ID <- c(1,2,3,4,5,6,7,8,9) a <- c("A","B",NA,"D","E",NA,"G","H","I") b <- c(11,2233,12,2,22,13,23,23,100) c <- c(12,10,12,23,16,17,7,9,7) df <- data.frame(ID ,a,b,c)

Where there are some missing values for the vector "a". However, I have several tables that include the identifier and the missing rows:

 ID <- c(1,2,3,4,5,6,7,8,9) a <- c("A","B","C","D","E","F","G","H","I") key <- data.frame(ID,a)

Is there a way to include the missing rows from the key in column a using an identifier?

+6

r

user3833190 Jul 01 '15 at 8:24

source share

3 answers

You can just use match ; however, I would recommend that both of your datasets use character instead of factor to prevent headaches later.

 key$a <- as.character(key$a) df$a <- as.character(df$a) df$a[is.na(df$a)] <- key$a[match(df$ID[is.na(df$a)], key$ID)] df # ID abc # 1 1 A 11 12 # 2 2 B 2233 10 # 3 3 C 12 12 # 4 4 D 2 23 # 5 5 E 22 16 # 6 6 F 13 17 # 7 7 G 23 7 # 8 8 H 23 9 # 9 9 I 100 7

Of course, you can always stick to factor and consider the entire "ID" column and use labels to replace the values in the "a" column ....

 factor(df$ID, levels = key$ID, labels = key$a) ## [1] ABCDEFGHI ## Levels: ABCDEFGHI

Assign df$a and you're done ....

+1

A5C1D2H2I1M1N2O1R2T1 Jul 01 '15 at 8:34

source share

Named vectors create nice lookup tables:

 lookup <- a names(lookup) <- as.character(ID)

lookup is now a named vector, you can access each value by searching for [ID], for example. lookup ["2"] (make sure the number is a character, not a number)

 ## should give you a vector of a as required. lookup[as.character(ID_from_big_table)]

0

MarkeD Jul 01 '15 at 8:31

source share

David Arenburg · Accepted Answer · 2015-07-01T08:46:12+0000

Other options are using data.table quick binary connections and updating using referential features

 library(data.table) setkey(setDT(df), ID)[key, a := ia] df # ID abc # 1: 1 A 11 12 # 2: 2 B 2233 10 # 3: 3 C 12 12 # 4: 4 D 2 23 # 5: 5 E 22 16 # 6: 6 F 13 17 # 7: 7 G 23 7 # 8: 8 H 23 9 # 9: 9 I 100 7

If you want to replace only NA (not all combined cases), a more complicated implementation would be

 setkey(setDT(key), ID) setkey(setDT(df), ID)[is.na(a), a := key[.SD, a]]

How to match missing identifiers?

More articles: