Why is this simple function calling `lm (..., subset)` fail?

I am working on a custom function that includes an lm() call, but for some reason the function is not working. I cannot understand why he is failing.

Consider this example simplified for bare-bones:

 myfun <- function(form., data., subs., ...){ lm(form., data., subs., ...) } 

This will result in an error:

 myfun(mpg ~ cyl + hp, mtcars, TRUE) ## Error in eval(expr, envir, enclos) : object 'subs.' not found 

However, using lm() directly will work fine:

 lm(mpg ~ cyl + hp, mtcars, TRUE) ## ## Call: ## lm(formula = mpg ~ cyl + hp, data = mtcars, subset = TRUE) ## ## Coefficients: ## (Intercept) cyl hp ## 36.90833 -2.26469 -0.01912 

I tried debugging, but still can not figure out the essence of the problem. Why doesn't the user function work? It is clear that the function subs. was passed to the function ...


Edit:

While most of the solutions suggested below help in this simple case, the function will still fail if I add a simple twist. For example, expand.model.frame() relies on a formula environment, but fails if I use the usual evaluation solution:

 myfun <- function(form., data., subs., ...){ fit <- lm(form., data.[ subs., ], ...) expand.model.frame(fit, ~ drat) } myfun(mpg ~ cyl + hp, mtcars, TRUE) ## Error in eval(expr, envir, enclos) : object 'data.' not found 

This is obviously due to the original problem, but I cannot figure out how to do this. Is the model formula environment incorrect?

+5
source share
4 answers

As suggested in the comments, another solution would be to avoid the subset argument altogether in non-interactive use and use a standard evaluation instead:

 myfun <- function(form., data., subs., ...){ lm(form., data.[ subs., ], ...) } 

Now this works as expected:

 myfun(formula(mpg ~ cyl + hp), mtcars, TRUE) 

However, this will not be enough if your user-defined function subsequently contains calls of type expand.model.frame() or the like, which, apparently, are themselves sensitive to the non-standard evaluation of the subset argument. To make the function reliable and avoid surprises, you need to (1) define the formula in the user-defined function (see also the reformulate approach ) and (2) a subset of the data before calling lm() , while avoiding the subset argument.

Like this:

 myfun <- function(form., data., subs., ...){ stopifnot(is.character(form.)) data. <- data.[ subs., ] fit <- lm(as.formula(form.), data., ...) expand.model.frame(fit, ~ drat) } myfun("mpg ~ cyl + hp", mtcars, TRUE) 

I tried to use either (1) or (2), but still managed to get lost in strange errors from some functions, and only with both (1) and (2) the errors seem to go away ...

+5
source

The reason this function does not work is due to the way the subset argument is evaluated:

All "weights", "subset" and "offset" are evaluated in the same way as variables in the formula, which is first in the data and then in the environment of the "formula".

In other words, lm looking for a variable called subs. in data , and then in the formula environment, and since in any of these environments there is no subs. variable subs. , it causes an error.

+4
source

You can do something like this:

 myfun <- function(form., data., subs., ...){ lm(as.formula(form.), data., subs., ...) } 

Name it myfun("mpg ~ cyl + hp", mtcars, T) . This forces the formula to create a myfun function in the environment, which will then contain subs. .

+3
source

Based on @ErnestA's answer, you can change your function to make sure subs. present in the environment of the formula form. :

 myfun <- function(form., data., subs., ...){ assign("subs.", subs., envir=environment(form.)) lm(form., data., subs., ...) } 

ETA, in order to avoid form contamination, you can create a new environment this way:

 myfun <- function(form., data., subs., ...){ environment(form.) <- new.env(parent=environment(form.)) assign("subs.", subs., envir=environment(form.)) lm(form., data., subs., ...) } 

ETA, perhaps the easiest way to fix the lm problem is to set up a form. environment form. to myfun value:

 myfun <- function(form., data., subs., ...){ environment(form.) <- environment() lm(form., data., subs., ...) } myfun(mpg ~ cyl + hp, mtcars, TRUE) ## Call: ## lm(formula = form., data = data., subset = subs.) ## ## Coefficients: ## (Intercept) cyl hp ## 36.90833 -2.26469 -0.01912 

Addressing the problem expand.model.frame , subs. not found, although it is used in an environment in which ?expand.model.frame is used. Is this a bug in expand.model.frame? or at least a conflict with the documentation?

 myfun <- function(form., data., subs., ...){ environment(form.) <- environment() fit <- lm(form., data., subs., ...) print(ls(environment(formula(fit)))) expand.model.frame(fit, ~drat ) } myfun(mpg ~ cyl + hp, mtcars, TRUE) ## [1] "data." "fit" "form." "subs." ## Error in eval(expr, envir, enclos) : object 'subs.' not found 

Room subs. to the parent environment seems to work.

 myfun <- function(form., data., subs., ...){ environment(form.) <- environment() fit <- lm(form., data., subs., ...) assign("subs.", subs., envir = parent.env(environment(formula(fit)))) expand.model.frame(fit, ~drat) } myfun(mpg ~ cyl + hp, mtcars, TRUE) ## mpg cyl hp drat ## Mazda RX4 21.0 6 110 3.90 ## Mazda RX4 Wag 21.0 6 110 3.90 ## Datsun 710 22.8 4 93 3.85 ## Hornet 4 Drive 21.4 6 110 3.08 ## etc. 

But this has problems with pollution of the parent environment, in this case R_GlobalEnv . I could not get it to work using anything other than R_GlobalEnv as the parent.

+2
source

All Articles