This is a more complicated continuation of my previous question. The answer was to use a matrix, but this does not work with data frames having values ββof different modes.
I want to combine data frames of different sizes with columns of characters and integers and calculate their sum depending on several conditions.
Conditions
- amounts are calculated only for those lines that have the corresponding "Name" value
- amounts are calculated only for matching column names
- If the cell in
df4 not 0, not NA, the sum should be df3 + df4 - otherwise the amount should be
df1 + df2 + df3
Example
> df1 <- data.frame(Name=c("Joe","Ann","Lee","Dan"), "1"=c(0,1,5,2), "2"=c(3,1,0,0), "3"=c(2,0,2,2), "4"=c(2,1,3,4)) > df1 Name X1 X2 X3 X4 1 Joe 0 3 2 2 2 Ann 1 1 0 1 3 Lee 5 0 2 3 4 Dan 2 0 2 4 > df2 <- data.frame(Name=c("Joe","Ann","Ken"), "1"=c(3,4,1), "2"=c(2,3,0), "3"=c(2,4,3)) > df2 Name X1 X2 X3 1 Joe 3 2 2 2 Ann 4 3 4 3 Ken 1 0 3 > df3 <- data.frame(Name=c("Lee","Ben"), "1"=c(1,3), "2"=c(3,4), "3"=c(4,3)) > df3 Name X1 X2 X3 1 Lee 1 3 4 2 Ben 3 4 3
The condition depends on this frame:
> df4 <- data.frame(Name=c("Lee","Ann","Dan"), "1"=c(6,0,NA), "2"=c(0,0,4), "3"=c(0,NA,0)) > df4 Name X1 X2 X3 1 Lee 6 0 0 2 Ann 0 0 NA 3 Dan NA 4 0
With the examples above, this is the expected result (* values ββdepend on df4):
> dfsum Name X1 X2 X3 X4 1 Joe 3 5 4 2 2 Ann 5 4 4 1 3 Lee 7* 3 6 3 4 Dan 2 4* 2 4 5 Ken 1 0 3 NA 6 Ben 3 4 3 NA
Possible steps?
First expand df1, df2, df3, df4 to 5 columns and 6 rows, fill in the missing NA data.
Then for each data frame:
- sort strings by "Name"
- separate column "Name" from "X1" ... "X4"
- convert columns "X1" ... "X4" to a matrix
- calculate the sums of matrices, as in the answer to my other question , but with the additional condition 1
- converts a result matrix into a data frame
- cbind "Name" column with result data frame
How can this be done in R?
Decision
@Ricardo Saporta's solution works with minor changes:
Add , padValue=NA) to four addCols ().
As said here , replace the definitions of sumD3D4 and dtsum with:
plus <- function(x) { if(all(is.na(x))){ c(x[0],NA)} else { sum(x,na.rm = TRUE)} } sumD3D4 <- setkey(rbind(dt3, dt4)[,lapply(.SD, plus), by = Name], "Name") dtsum <- setkey(rbind(dt1, dt2, dt3)[, lapply(.SD, plus), by=Name], "Name")