Adding two variables with missing data

This is probably a very simple question for regular R users, but I cannot find a solution. I want to add two variables with missing data.

x1<-c(NA,3,NA,5) x2<-c(NA,NA,4,3) x3<-x1+x2 x3 [1] NA NA NA 8

But I really want:

 [1] NA 3 4 8 

Any suggestions would be highly appreciated. How can I save NA?

+5
source share
4 answers

To preserve NA if both parameters are NA (reset @Ben Bolker approach using cbind ):

 apply(cbind(x1, x2), 1, function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm=T))) # [1] NA 3 4 8 

Or, if you prefer to use the rowSums function (which is attractive because it is vectorized, whereas the apply and mapply solutions are not):

 rowSums(cbind(x1, x2), na.rm=T) + ifelse(is.na(x1) & is.na(x2), NA, 0) # [1] NA 3 4 8 

None of them will be as fast as the Rcpp function (which would only have to scroll through two inputs once):

 library(Rcpp) sum.na.ign <- cppFunction(" NumericVector sumNaIgn(NumericVector x, NumericVector y) { const int n = x.size(); NumericVector out(n); for (int i=0; i < n; ++i) { if (R_IsNA(x[i])) { out[i] = y[i]; } else if (R_IsNA(y[i])) { out[i] = x[i]; } else { out[i] = x[i] + y[i]; } } return out; }") sum.na.ign(x1, x2) # [1] NA 3 4 8 

We can compare (along with a solution based on mapply from @J. Won.) For large vectors:

 # First two functions along with mapply-based solution from @J. Won. f1 <- function(x1, x2) apply(cbind(x1, x2), 1, function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm=T))) f2 <- function(x1, x2) rowSums(cbind(x1, x2), na.rm=T) + ifelse(is.na(x1) & is.na(x2), NA, 0) NAsum <- function(...) { if(any(!is.na(c(...)))) return(sum(..., na.rm=TRUE)) return(NA) } jwon <- function(x1, x2) mapply(NAsum, x1, x2) set.seed(144) x1 <- sample(c(NA, 1:10), 10000, replace=T) x2 <- sample(c(NA, 1:10), 10000, replace=T) all.equal(jwon(x1, x2), f1(x1, x2), f2(x1, x2), sum.na.ign(x1, x2)) # [1] TRUE library(microbenchmark) microbenchmark(jwon(x1, x2), f1(x1, x2), f2(x1, x2), sum.na.ign(x1, x2)) # Unit: microseconds # expr min lq mean median uq max neval # jwon(x1, x2) 24044.658 28387.4280 35580.3434 35134.9940 38175.661 91476.032 100 # f1(x1, x2) 37516.769 46664.6390 52293.5265 51570.2690 56647.063 77576.091 100 # f2(x1, x2) 2588.820 2738.0740 2930.4106 2833.4880 2974.745 5187.684 100 # sum.na.ign(x1, x2) 97.988 109.8575 132.9849 123.0795 142.725 533.275 100 

The rowSums solution rowSums vectorized and therefore faster than the apply and mapply (they will be slow with 1 million vectors), but the custom Rcpp solution is more than 10 times faster than the rowSums approach. Your vectors should probably be quite large for Rcpp to be useful compared to rowSums .

+8
source
 mapply(sum, x1, x2, na.rm=TRUE) 

EDIT: if we need a more complex version, as indicated in the comment, I think it needs a special function for it

 NAsum <- function(...) { if(any(!is.na(c(...)))) return(sum(..., na.rm=TRUE)) return(NA) } mapply(NAsum, x1, x2) 
+7
source

There is no + option to suppress NA values, but you can:

 rowSums(cbind(x1,x2),na.rm=TRUE) ## [1] 2 3 4 8 
+5
source

I tried using the following code given as an answer above to solve a problem with which I collected more than two variables in a data frame. I'm not sure if this is allowed on the platform, but still wanted to share.

apply (cbind (x1, x2), 1, function (x) ifelse (all (is.na (x)), NA, sum (x, na.rm = )))

Below are my details

  x1 x2 x3 VAt 1 NA NA a NA 2 3 NA b 1 3 NA 4 c 2 4 5 3 d NA 

One <- read_csv ("~ / One.csv")

One $ use <- apply (cbind (One $ x1, One $ x2, One $ VAt), 1, function (x) IfElse (all (is.na (x)), NA, sum (x, na.rm = T)))

One $ use1 <- s (one, apply (cbind (x1, x2, VAt), 1, Function (x) IfElse (all (is.na (x)), NA, sum (x, na.rm = T) )))

And then the result is displayed.

  X1 x1 x2 x3 VAt use use1 1 1 NA NA a NA NA NA 2 2 3 NA b 1 4 4 3 3 NA 4 c 2 6 6 4 4 5 3 d NA 8 8 

Thanks to @swhusky for the question and @josliber for the answer.

0
source

Source: https://habr.com/ru/post/1213663/


All Articles