Change replacement NA

I am trying to combine two data frames with a unique identifier and year. In SQL, I am trying to make a left outer join, so in merge this is all.x = TRUE. Some elements of the y-frame do not have all the values ​​(unique combinations of id, year) in x DF. If there is no match, I want to combine a row from the data frame y, which has the same unique identifier as in the data frame x, but using the first year that I have before the missing one. Any suggestions on how to approach this merger? Many thanks!

Edit Required to make it more specific.

Dataframe x:

Id year var1 1 2010 100 1 2011 105 1 2012 110 2 2010 100 2 2011 105 2 2012 106 

Dataframe y:

 Id year var2 var3 1 2010 5 7 1 2011 10 8 2 2010 9 6 

Merge Required:

 Id year var1 var2 var3 1 2010 100 5 7 1 2011 105 10 8 1 2012 110 10 8 2 2010 100 9 6 2 2011 105 9 6 2 2012 106 9 6 
+1
source share
2 answers

I would do this in two steps:

 > out <- merge(x, y, all.x=T) > out Id year var1 var2 var3 1 1 2010 100 5 7 2 1 2011 105 10 8 3 1 2012 110 NA NA 4 2 2010 100 9 6 5 2 2011 105 NA NA 6 2 2012 106 NA NA 

Then use na.locf from the zoo package:

 library(zoo) > apply(out, 2, na.locf) Id year var1 var2 var3 [1,] 1 2010 100 5 7 [2,] 1 2011 105 10 8 [3,] 1 2012 110 10 8 [4,] 2 2010 100 9 6 [5,] 2 2011 105 9 6 [6,] 2 2012 106 9 6 

and this can be forcibly applied to data.frame.

 > as.data.frame(apply(out, 2, na.locf)) Id year var1 var2 var3 1 1 2010 100 5 7 2 1 2011 105 10 8 3 1 2012 110 10 8 4 2 2010 100 9 6 5 2 2011 105 9 6 6 2 2012 106 9 6 
+2
source

This does not use merge , but iterates through the rows from x one by one to find the corresponding match in y . Probably inefficient, but it works.

 do.call(rbind, lapply(seq(length=nrow(x)), function(r) { yid <- y[y$Id==x$Id[r],] yeardiff <- x$year[r] - yid$year yeardiff[yeardiff < 0] <- NA cbind(x[r,], yid[which.min(yeardiff),]) })) 

Result

  Id year var1 Id year var2 var3 1 1 2010 100 1 2010 5 7 2 1 2011 105 1 2011 10 8 3 1 2012 110 1 2011 10 8 4 2 2010 100 2 2010 9 6 5 2 2011 105 2 2010 9 6 6 2 2012 106 2 2010 9 6 
+1
source

Source: https://habr.com/ru/post/923695/


All Articles