Replace the missing values ​​(NA) in one data set with the values ​​from the other, where the columns match

I have a data frame (datadf) with 3 columns, 'x', 'y and z. Several values ​​of "x" are missing ( NA ). 'y' and 'z' are not measured variables.

 xyz 153 a 1 163 b 1 NA d 1 123 a 2 145 e 2 NA c 2 NA b 1 199 a 2 

I have another data frame (imputeddf) with the same three columns:

  xyz 123 a 1 145 a 2 124 b 1 168 b 2 123 c 1 176 c 2 184 d 1 101 d 2 

I want to replace NA in 'x' in 'datadf' with the values ​​from 'imputeddf', where 'y' and 'z' match between two datasets (each combination of 'y' and 'z' has its own value 'x', to fill in).

Desired Result:

 xyz 153 a 1 163 b 1 184 d 1 123 a 2 145 e 2 176 c 2 124 b 1 199 a 2 

I am trying things like:

 finaldf <- datadf finaldf$x <- if(datadf[!is.na(datadf$x)]){ddply(datadf, x=imputeddf$x[datadf$y == imputeddf$y & datadf$z == imputeddf$z])}else{datadf$x} 

but it does not work.

What is the best way to populate NA when using my imputed df value?

+7
source share
2 answers

I would do this:

 library(data.table) setDT(DF1); setDT(DF2) DF1[DF2, x := ifelse(is.na(x), ix, x), on=c("y","z")] 

which gives

  xyz 1: 153 a 1 2: 163 b 1 3: 184 d 1 4: 123 a 2 5: 145 e 2 6: 176 c 2 7: 124 b 1 8: 199 a 2 

Comments This approach is not so great because it combines all of DF1 , and we only need to combine the subset, where is.na(x) . Here the improvement looks (thanks, @Arun):

 DF1[is.na(x), x := DF2[.SD, x, on=c("y", "z")]] 

This method is similar to @RHertel's answer.

+6
source

Here's an alternative with a R base:

 df1[is.na(df1$x),"x"] <- merge(df2,df1[is.na(df1$x),][,c("y","z")])$x > df1 # xyz #1 153 a 1 #2 163 b 1 #3 124 b 1 #4 123 a 2 #5 145 e 2 #6 176 c 2 #7 184 d 1 #8 199 a 2 
+3
source

All Articles