R: How can I sum variables within cases, assuming NA is zero

Fake data to illustrate:

df <- data.frame(a=c(1,2,3,4,5), b=(c(2,2,2,2,NA)), 
                 c=c(NA,2,3,4,5)))

This will give me the answer I want IF it is not for NA values:

df$count <- with(df, (a==1) + (b==2) + (c==3)) 

In addition, there would be an even more elegant way if I were only interested in, for example. Variables == 2?

df$count <- with(df, (a==2) + (b==2) + (c==2)) 

Many thanks!

+5
source share
2 answers

The following steps are for your specific example, but I have a suspicion that your real use case is more complex:

df$count <- apply(df,1,function(x){sum(x == 1:3,na.rm = TRUE)})
> df
  a  b  c count
1 1  2 NA     2
2 2  2  2     1
3 3  2  3     2
4 4  2  4     1
5 5 NA  5     0

but this general approach should work. For example, your second example would be something like this:

df$count <- apply(df,1,function(x){sum(x == 2,na.rm = TRUE)})

or, moreover, you can allow yourself to pass a variable for comparison:

df$count <- apply(df,1,function(x,compare){sum(x == compare,na.rm = TRUE)},compare = 1:3)
+5
source

data.frame, negate, rowSums na.rm=TRUE:

target <- 1:3
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 2 1 2 1 0

target <- rep(2,3)
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 1 3 1 1 0
+2

All Articles