I have a linear model where one of the independent variables is a factor and where I am trying to make predictions in a dataset that contains a new factor level (the level of a factor that was not in the dataset that was estimated). I want to be able to make predictions for observations with a new factor level, manually specifying the coefficient that will be applied to the factor. For example, suppose I estimate daily sales for three types of stores, and I enter the fourth type of store into the data set. I do not have historical data, but I can assume that it will behave as some balanced combination of other stores for which I have model coefficients.
If I try to apply predict.lm() to the new data, I will get an error telling me that this factor has new levels (it makes sense).
df <- data.frame(y=rnorm(100), x1=factor(rep(1:4,25))) lm1 <- lm(y ~ x1, data=df) newdata <- data.frame(y=rnorm(100), x1=factor(rep(1:5,20))) predict(lm1, newdata) Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor x2 has new levels 5
I could do the prediction manually by simply multiplying the odds by the individual columns in the data.frame. However, this is cumbersome, given that in the real model I am working with, there are many variables and terms of interaction, and I want to be able to easily cycle through various model specifications, changing the model formula. Is there a way for me to significantly add a new coefficient to the model object, and then use it to make forecasts? If not, is there another approach that is less cumbersome than setting the entire forecasting step manually?
r
Abiel
source share