Custom na.action in R

I'm currently trying to build an LDA model in a dataset that contains some missing values ​​( NA ). For example, I want to attribute the average value of NA . As far as I understand, I can set na.action=na.omit in the lda and predict functions, which will remove observations when building the model and force NA to be returned when creating forecasts.

 my.dat <- as.data.frame(cbind( c(0, 1, 0, 1, 1, 0), c(5, 8, 9, 1, -1, NA), c(-2.4, -4.0, -4.4, -0.5, 0.7, -0.3) )) mod <- lda(my.dat[,-1], my.dat[,1], na.action=na.omit) predict(mod, my.dat[,-1], na.action=na.omit) 

But now I want to ascribe funds where I have the value NA . So, I can define my own na.impute function. But I can’t understand what is being passed to this function, and what I need to return.

 na.impute <- function (object) { print(object) object } 

which gives me the conclusion:

 [1] gx <0 rows> (or 0-length row.names) 

which doesn't make much sense to me. I can not find any directions in the documentation. What is an object , and how can I manipulate it to overwrite NA values?

+4
source share
1 answer

Here is the first way to find out what object :

 na.impute <- function (object) { browser() print(object) object } lda(my.dat[,-1], my.dat[,1], na.action=na.impute) # Called from: na.action(structure(list(g = grouping, x = x), class = "data.frame")) Browse[1]> str(object) # 'data.frame': 0 obs. of 2 variables: # $ g: num 0 1 0 1 1 0 # $ x: matrix [1:6, 1:2] 5 8 9 1 -1 NA -2.4 -4 -4.4 -0.5 ... # ..- attr(*, "dimnames")=List of 2 # .. ..$ : NULL # .. ..$ : chr "V2" "V3" Browse[1]> object$g # [1] 0 1 0 1 1 0 Browse[1]> object$x # V2 V3 # [1,] 5 -2.4 # [2,] 8 -4.0 # [3,] 9 -4.4 # [4,] 1 -0.5 # [5,] -1 0.7 # [6,] NA -0.3 # attr(,"class") # [1] "matrix" 

So this is a really unusual object: structure(list(g = grouping, x = x), class = "data.frame") . Another way to see this, let's check the lda function:

 lda # function (x, ...) # UseMethod("lda") # <bytecode: 0x0e3583fc> # <environment: namespace:MASS> methods(lda) # [1] lda.collapsed.gibbs.sampler lda.data.frame* lda.default* # [4] lda.formula* lda.matrix* # # Non-visible functions are asterisked 

In this case, we are interested in lda.data.frame . Since this is an asterisk, we must use either MASS:::lda.data.frame or getAnywhere("lda.data.frame") to see the source code:

 function (x, ...) { res <- lda(structure(data.matrix(x), class = "matrix"), ...) cl <- match.call() cl[[1L]] <- as.name("lda") res$call <- cl res } <bytecode: 0x067c3248> <environment: namespace:MASS> 

Now we see that lda.matrix is required, so again using one of two functions:

 function (x, grouping, ..., subset, na.action) { if (!missing(subset)) { x <- x[subset, , drop = FALSE] grouping <- grouping[subset] } if (!missing(na.action)) { dfr <- na.action(structure(list(g = grouping, x = x), class = "data.frame")) grouping <- dfr$g x <- dfr$x } res <- lda.default(x, grouping, ...) cl <- match.call() cl[[1L]] <- as.name("lda") res$call <- cl res } <bytecode: 0x067bf7b8> <environment: namespace:MASS> 

And finally, here we find the na.action challenge that we expected. Now this is a function that replaces NA values ​​with a column value:

 na.impute <- function (object) { temp <- object$x k <- which(is.na(temp), arr.ind = TRUE) temp[k] <- colMeans(temp, na.rm = TRUE)[k[, 2]] structure(list(g = object$g, x = as.matrix(temp)), class = "data.frame") } lda(my.dat[,-1], my.dat[,1], na.action=na.impute) # Call: # lda(my.dat[, -1], my.dat[, 1], na.action = na.impute) # # Prior probabilities of groups: # 0 1 # 0.5 0.5 # # Group means: # V2 V3 # 0 6.133333 -2.366667 # 1 2.666667 -1.266667 # # Coefficients of linear discriminants: # LD1 # V2 -0.8155124 # V3 -1.1614265 

Now, considering predict and na.action , it is not available: see getAnywhere("predict.lda") , the use of this argument is not used.

+2
source

All Articles