Replace the missing values (NA) in one data set with the values from the other, where the columns match

Question

Replace the missing values (NA) in one data set with the values from the other, where the columns match

I have a data frame (datadf) with 3 columns, 'x', 'y and z. Several values of "x" are missing ( NA ). 'y' and 'z' are not measured variables.

 xyz 153 a 1 163 b 1 NA d 1 123 a 2 145 e 2 NA c 2 NA b 1 199 a 2

I have another data frame (imputeddf) with the same three columns:

  xyz 123 a 1 145 a 2 124 b 1 168 b 2 123 c 1 176 c 2 184 d 1 101 d 2

I want to replace NA in 'x' in 'datadf' with the values from 'imputeddf', where 'y' and 'z' match between two datasets (each combination of 'y' and 'z' has its own value 'x', to fill in).

Desired Result:

 xyz 153 a 1 163 b 1 184 d 1 123 a 2 145 e 2 176 c 2 124 b 1 199 a 2

I am trying things like:

 finaldf <- datadf finaldf$x <- if(datadf[!is.na(datadf$x)]){ddply(datadf, x=imputeddf$x[datadf$y == imputeddf$y & datadf$z == imputeddf$z])}else{datadf$x}

but it does not work.

What is the best way to populate NA when using my imputed df value?

+7

r aggregate plyr melt

user3337754 Sep 17 '15 at 19:51

source share

2 answers

Here's an alternative with a R base:

 df1[is.na(df1$x),"x"] <- merge(df2,df1[is.na(df1$x),][,c("y","z")])$x > df1 # xyz #1 153 a 1 #2 163 b 1 #3 124 b 1 #4 123 a 2 #5 145 e 2 #6 176 c 2 #7 184 d 1 #8 199 a 2

+3

Rhertel Sep 17 '15 at 20:21

source share

Frank · Accepted Answer · 2015-09-17T20:12:45+0000

I would do this:

 library(data.table) setDT(DF1); setDT(DF2) DF1[DF2, x := ifelse(is.na(x), ix, x), on=c("y","z")]

which gives

  xyz 1: 153 a 1 2: 163 b 1 3: 184 d 1 4: 123 a 2 5: 145 e 2 6: 176 c 2 7: 124 b 1 8: 199 a 2

Comments This approach is not so great because it combines all of DF1 , and we only need to combine the subset, where is.na(x) . Here the improvement looks (thanks, @Arun):

 DF1[is.na(x), x := DF2[.SD, x, on=c("y", "z")]]

This method is similar to @RHertel's answer.

Replace the missing values ​​(NA) in one data set with the values ​​from the other, where the columns match

More articles:

Replace the missing values (NA) in one data set with the values from the other, where the columns match