Add variables ignoring the use of the NA function using the transform function

Question

Add variables ignoring the use of the NA function using the transform function

I have a data frame with a lot of variables. I am creating new variables by adding some of the old ones. The code I use for this:

name_of_data_frame<- transform(name_of_data_frame, new_variable=var1+var2 +....)

When a conversion occurs with NA in one of the observations, it returns “NA” in the new variable, even if some of the other variables that it added were not NA.

eg. if var1= 4 , var2=3 , var3=NA , then using transform , if I did var1+var2+var3 , it would give NA , whereas I would like it to give me 7.

I do not want to recode my NA to zero in the data frame, since I may need to return to NA later, so I do not want to confuse NA with observations that were really 0 .

Any help on how to get around NA R-processing in the way described above with the conversion function would be great (or if there are alternative functions to use, that would also be great).

Please note that I do not always just summarize variables that are next to each other, I also often share variables by multiplying, subtracting, etc.

+4

r

Timothy alston Aug 27 '12 at 9:32

source share

2 answers

Using rowSums and prod can help you.

 set.seed(007) # Generating some data DF <- data.frame(V1=sample(c(50,NA,36,24,80, NA), 15, replace=TRUE), V2=sample(c(70,40,NA,25,100, NA), 15, replace=TRUE), V3=sample(c(20,26,34,15,78,40), 15, replace=TRUE)) transform(DF, Sum=rowSums(DF, na.rm=TRUE)) # Sum (a vector of values) transform(DF, Prod=apply(DF, 1, FUN=prod, na.rm=TRUE)) # Prod (a vector of values) # Defining a function for substracting (resta, in spanish :D) resta <- function(x) Reduce(function(a,b) ab, x <- x[!is.na(x)]) transform(DF, Substracting=apply(DF, 1, resta)) # Defining a function for dividing div <- function(x) Reduce(function(a,b) a/b, x <- x[!is.na(x)]) transform(DF, Divsion=apply(DF, 1, div))

+2

Jilber urbina Aug 27 '12 at 11:10

source share

Andrie · Accepted Answer · 2012-08-27T10:33:37+0000

My first instinct was to suggest using sum() , since you can use the na.rm argument. However, this does not work, since sum() reduces its arguments to a single scalar value, and not to a vector.

This means that you need to write a parallel sum function. Let us call it psum() , similar to the base function R pmin() or pmax() :

 psum <- function(..., na.rm=FALSE) { x <- list(...) rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm) }

Now configure some data and use psum() to get the desired vector:

 dat <- data.frame( x = c(1,2,3, NA), y = c(NA, 4, 5, NA)) transform(dat, new=psum(x, y, na.rm=TRUE)) xy new 1 1 NA 1 2 2 4 6 3 3 5 8 4 NA NA 0

Similarly, you can define parallel product or pprod() as follows:

 pprod <- function(..., na.rm=FALSE) { x <- list(...) m <- matrix(unlist(x), ncol=length(x)) apply(m, 1, prod, na.rm=TRUE) } transform(dat, new=pprod(x, y, na.rm=TRUE)) xy new 1 1 NA 1 2 2 4 8 3 3 5 15 4 NA NA 1

This pprod example provides a generic template for what you want to do: Create a function that uses apply() to sum the input matrix into the desired vector.

Add variables ignoring the use of the NA function using the transform function

More articles: