R: apply expression to take the sum of the number of non-NA values in multiple columns

Question

R: apply expression to take the sum of the number of non-NA values in multiple columns

I have a large data file about visits to the doctor. Each record (line) can contain up to 11 diagnostic codes. I want to know how many diagnostic codes other than NA in each line.

Here is an example of data:

diag1 diag2 diag3 diag4 diag5 diag6 diag7 diag8 diag9 diag10 diag11 786 272 401 782 250 91912 530 NA NA NA NA 845 530 338 311 NA NA NA NA NA NA NA

So, in these two lines, I would like to know that in line 1 there were 7 codes, and in line 2 there were 4 codes. The data block is 31,596 rows, so the loop is too long. I would like to use the "apply" statement to speed things up:

 z = apply(y[,paste("diag", 1:11, sep="")], 1, function(x)sum({any(x[!is.na(x)])}))

R simply returns vector 1, which is the same length as the number of rows in the data set. I think something is wrong using "any"? Does anyone have a good way to count the number of non-NA values in multiple columns? Thank you

+4

function r rows apply any

mEvans May 07, '12 at 17:07

source share

2 answers

You can also use:

 apply(y, 1, function(x) length(na.omit(x)))

but Joshua Ulrich’s answer is much faster.

+3

Tyler rinker May 07, '12 at 17:42

source share

Joshua ulrich · Accepted Answer · 2012-05-07T17:14:50+0000

Just use is.na and rowSums :

 z <- rowSums(!is.na(y[,paste("diag", 1:11, sep="")]))

R: apply expression to take the sum of the number of non-NA values ​​in multiple columns

More articles:

R: apply expression to take the sum of the number of non-NA values in multiple columns