I have the following three frames:
df1 <- data.frame(name=c("John", "Anne", "Christine", "Andy"), age=c(31, 26, 54, 48), height=c(180, 175, 160, 168), group=c("Student",3,5,"Employer"), stringsAsFactors=FALSE) df2 <- data.frame(name=c("Anne", "Christine"), age=c(26, 54), height=c(175, 160), group=c(3,5), group2=c("Teacher",6), stringsAsFactors=FALSE) df2 <- data.frame(name=c("Christine"), age=c(54), height=c(160), group=c(5), group2=c(6), group3=c("Scientist"), stringsAsFactors=FALSE)
I would like to combine them to get the following result:
df.all <- data.frame(name=c("John", "Anne", "Christine", "Andy"), age=c(31, 26, 54, 48), height=c(180, 175, 160, 168), group=c("Student", "Teacher", "Scientist", "Employer"))
At the moment, I am doing it like this:
df.all <- merge(merge(df1[,c(1,4)], df2[,c(1,5)], all=TRUE, by="name"), df3[,c(1,6)], all=TRUE, by="name") row.ind <- which(df.all$group %in% c(6,5)) df.all[row.ind, c("group")] <- df.all[row.ind, c("group2")] row.ind2 <- which(df.all$group2 %in% c(6)) df.all[row.ind2, c("group")] <- df.all[row.ind2, c("group3")]
This is not generalized, and it is really dirty. Maybe there is a way to use merge_all or merge_recurse for the merge step (especially since more than two data files can be merged), but I did not understand how to do this. These two results do not give the correct result:
df.all <- merge_all(list(df1, df2, df3)) df.all <- merge_recurse(list(df1, df2, df3), by=c("name"))
Is there a more general and elegant way to solve this problem?