For example, consider the basic regression model in R:
form1 <- Petal.Length ~ Sepal.Length + Sepal.Width
fit1 <- lm(form1, iris)
(I apologize to any nerds who post here.)
To add quadratic and interacting terms, I know three approaches:
1) old fashioned way
Enter terms one at a time:
form2 <- . ~ Sepal.Length*Sepal.Width + I(Sepal.Length^2) + I(Sepal.Width^2)
fit2 <- update(fit1, form2)
It does not scale beyond small formulas, and you cannot program it.
2) the ugly way
String Processing:
vars <- attr(terms(form1), "term.labels")
squared_terms <- sprintf("I(%s^2)", vars)
inter_terms <- combn(vars, 2, paste, collapse = "*")
form2 <- reformulate(c(inter_terms, squared_terms), ".")
It is scalable, but not very programmable, because the functions themselves must be hardcoded.
3) "rear entrance"
Manipulating data directly
library(lazyeval)
library(dplyr)
square <- function (v) interp(~ I(v1^2), v1 = as.name(v))
inter <- function(v) interp(~ v1*v2, v1 = as.name(v[1]), v2 = as.name(v[2]))
vars <- attr(terms(form1), "term.labels")
squared_terms <- lapply(vars, square) %>%
set_names(paste0(vars, " ^2"))
inter_terms <- combn(vars, 2, inter, simplify = FALSE) %>%
set_names(combn(vars, 2, paste, collapse = " x "))
fit2 <- model.frame(fit1) %>%
mutate_(.dots = squared_terms) %>%
mutate_(.dots = inter_terms) %>%
lm(Petal.Length ~ ., data = .)
It is quite scalable and programmable to a naming variable. But it also looks crazy because it strikes the purpose of use in the first place formula.
what i would like to do
- :
library(lazyeval)
library(dplyr)
square <- function (v) interp(~ I(v1^2), v1 = as.name(v))
inter <- function(v) interp(~ v1*v2, v1 = as.name(v[1]), v2 = as.name(v[2]))
squared_terms <- apply.formula(form1, squared_terms)
inter_terms <- combn.formula(form1, 2, inter)
fit2 <- form1 %>%
append.formula(squared_terms) %>%
append.formula(inter_terms) %>%
update(fit1, .)
dplyr , :
1 3, 2 2. 4 - , ?