The object did not detect an error while passing the model formula to another function

I have a strange problem with R that I cannot work.

I tried to write a function that performs K-fold cross validation for the model selected by the step procedure in R. (I know the problems with step procedures, this is purely for comparison) :)

Now the problem is that if I define the parameters of the function (linmod, k, direction) and run the contents of the function, it works flawlessly. BUT, if I run it as a function, I get an error that the datas.train object could not be found.

I tried to execute the function using the debug () function, and the object explicitly exists, but R says no when I actually run the function. If I just fit the model using lm (), it works fine, so I believe this is a problem with the step function in the loop, and inside the function. (try commenting on the step command and predict those from the regular linear model.)

#CREATE A LINEAR MODEL TO TEST FUNCTION lm.cars <- lm(mpg~.,data=mtcars,x=TRUE,y=TRUE) #THE FUNCTION cv.step <- function(linmod,k=10,direction="both"){ response <- linmod$y dmatrix <- linmod$x n <- length(response) datas <- linmod$model form <- formula(linmod$call) # generate indices for cross validation rar <- n/k xval.idx <- list() s <- sample(1:n, n) # permutation of 1:n for (i in 1:k) { xval.idx[[i]] <- s[(ceiling(rar*(i-1))+1):(ceiling(rar*i))] } #error calculation errors <- R2 <- 0 for (j in 1:k){ datas.test <- datas[xval.idx[[j]],] datas.train <- datas[-xval.idx[[j]],] test.idx <- xval.idx[[j]] #THE MODELS+ lm.1 <- lm(form,data= datas.train) lm.step <- step(lm.1,direction=direction,trace=0) step.pred <- predict(lm.step,newdata= datas.test) step.error <- sum((step.pred-response[test.idx])^2) errors[j] <- step.error/length(response[test.idx]) SS.tot <- sum((response[test.idx] - mean(response[test.idx]))^2) R2[j] <- 1 - step.error/SS.tot } CVerror <- sum(errors)/k CV.R2 <- sum(R2)/k res <- list() res$CV.error <- CVerror res$CV.R2 <- CV.R2 return(res) } #TESTING OUT THE FUNCTION cv.step(lm.cars) 

Any thoughts?

+6
r formula
source share
2 answers

When you created the formula, lm.cars , it was assigned its own environment. This environment stays with the formula unless you explicitly change it. Therefore, when you retrieve a formula using the formula function, the original model environment is turned on.

I don’t know if I am using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:

 cv.step <- function(linmod,k=10,direction="both"){ response <- linmod$y dmatrix <- linmod$x n <- length(response) datas <- linmod$model .env <- environment() ## identify the environment of cv.step ## extract the formula in the environment of cv.step form <- as.formula(linmod$call, env = .env) ## The rest of your function follows 
+10
source share

Another problem that might cause this is to pass character (string vector ) to lm instead of formula . vector do not have an environment , and therefore, when lm converts character to formula , it apparently also does not have environment instead of automatically assigning a local environment. If you then use the object as a weight, which is not in the data argument of data.frame , but is in the argument of a local function, an not found error appears. This behavior is not very easy to understand. This is probably a mistake.

Here is a minimal reproducible example. This function takes data.frame , two variable names and a weight vector to use.

 residualizer = function(data, x, y, wtds) { #the formula to use f = "x ~ y" #residualize resid(lm(formula = f, data = data, weights = wtds)) } residualizer2 = function(data, x, y, wtds) { #the formula to use f = as.formula("x ~ y") #residualize resid(lm(formula = f, data = data, weights = wtds)) } d_example = data.frame(x = rnorm(10), y = rnorm(10)) weightsvar = runif(10) 

And the test:

 > residualizer(data = d_example, x = "x", y = "y", wtds = weightsvar) Error in eval(expr, envir, enclos) : object 'wtds' not found > residualizer2(data = d_example, x = "x", y = "y", wtds = weightsvar) 1 2 3 4 5 6 7 8 9 10 0.8986584 -1.1218003 0.6215950 -0.1106144 0.1042559 0.9997725 -1.1634717 0.4540855 -0.4207622 -0.8774290 

This is a very subtle mistake. If you go to the function environment using the browser , the weight vector may just be accurate, but it was somehow not found in the lm call!

The error will be more difficult to debug if the name weights was used for the weight variable. In this case, since lm cannot find the weight object, it uses the base weights() function by default, thereby throwing an even stranger error:

 Error in model.frame.default(formula = f, data = data, weights = weights, : invalid type (closure) for variable '(weights)' 

Do not ask me how many hours it took me to figure this out.

+4
source share

All Articles