Add variables ignoring the use of the NA function using the transform function

I have a data frame with a lot of variables. I am creating new variables by adding some of the old ones. The code I use for this:

name_of_data_frame<- transform(name_of_data_frame, new_variable=var1+var2 +....) 

When a conversion occurs with NA in one of the observations, it returns β€œNA” in the new variable, even if some of the other variables that it added were not NA.

eg. if var1= 4 , var2=3 , var3=NA , then using transform , if I did var1+var2+var3 , it would give NA , whereas I would like it to give me 7.

I do not want to recode my NA to zero in the data frame, since I may need to return to NA later, so I do not want to confuse NA with observations that were really 0 .

Any help on how to get around NA R-processing in the way described above with the conversion function would be great (or if there are alternative functions to use, that would also be great).

Please note that I do not always just summarize variables that are next to each other, I also often share variables by multiplying, subtracting, etc.

+4
source share
2 answers

My first instinct was to suggest using sum() , since you can use the na.rm argument. However, this does not work, since sum() reduces its arguments to a single scalar value, and not to a vector.

This means that you need to write a parallel sum function. Let us call it psum() , similar to the base function R pmin() or pmax() :

 psum <- function(..., na.rm=FALSE) { x <- list(...) rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm) } 

Now configure some data and use psum() to get the desired vector:

 dat <- data.frame( x = c(1,2,3, NA), y = c(NA, 4, 5, NA)) transform(dat, new=psum(x, y, na.rm=TRUE)) xy new 1 1 NA 1 2 2 4 6 3 3 5 8 4 NA NA 0 

Similarly, you can define parallel product or pprod() as follows:

 pprod <- function(..., na.rm=FALSE) { x <- list(...) m <- matrix(unlist(x), ncol=length(x)) apply(m, 1, prod, na.rm=TRUE) } transform(dat, new=pprod(x, y, na.rm=TRUE)) xy new 1 1 NA 1 2 2 4 8 3 3 5 15 4 NA NA 1 

This pprod example provides a generic template for what you want to do: Create a function that uses apply() to sum the input matrix into the desired vector.

+10
source

Using rowSums and prod can help you.

 set.seed(007) # Generating some data DF <- data.frame(V1=sample(c(50,NA,36,24,80, NA), 15, replace=TRUE), V2=sample(c(70,40,NA,25,100, NA), 15, replace=TRUE), V3=sample(c(20,26,34,15,78,40), 15, replace=TRUE)) transform(DF, Sum=rowSums(DF, na.rm=TRUE)) # Sum (a vector of values) transform(DF, Prod=apply(DF, 1, FUN=prod, na.rm=TRUE)) # Prod (a vector of values) # Defining a function for substracting (resta, in spanish :D) resta <- function(x) Reduce(function(a,b) ab, x <- x[!is.na(x)]) transform(DF, Substracting=apply(DF, 1, resta)) # Defining a function for dividing div <- function(x) Reduce(function(a,b) a/b, x <- x[!is.na(x)]) transform(DF, Divsion=apply(DF, 1, div)) 
+2
source

All Articles