Manually set the coefficient for a new factor level in forecasting

Question

Manually set the coefficient for a new factor level in forecasting

I have a linear model where one of the independent variables is a factor and where I am trying to make predictions in a dataset that contains a new factor level (the level of a factor that was not in the dataset that was estimated). I want to be able to make predictions for observations with a new factor level, manually specifying the coefficient that will be applied to the factor. For example, suppose I estimate daily sales for three types of stores, and I enter the fourth type of store into the data set. I do not have historical data, but I can assume that it will behave as some balanced combination of other stores for which I have model coefficients.

If I try to apply predict.lm() to the new data, I will get an error telling me that this factor has new levels (it makes sense).

 df <- data.frame(y=rnorm(100), x1=factor(rep(1:4,25))) lm1 <- lm(y ~ x1, data=df) newdata <- data.frame(y=rnorm(100), x1=factor(rep(1:5,20))) predict(lm1, newdata) Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor x2 has new levels 5

I could do the prediction manually by simply multiplying the odds by the individual columns in the data.frame. However, this is cumbersome, given that in the real model I am working with, there are many variables and terms of interaction, and I want to be able to easily cycle through various model specifications, changing the model formula. Is there a way for me to significantly add a new coefficient to the model object, and then use it to make forecasts? If not, is there another approach that is less cumbersome than setting the entire forecasting step manually?

+7

r

Abiel Aug 19 '13 at 0:43

source share

2 answers

Neal fultz · Answer 1 · 2013-10-23T00:09:43+0000

If you want level 5 to be evenly weighted, you can convert it to a matrix, connect 25% and multiply it by the coefficients from the model ...

 n.mat <- model.matrix(~x1, data=newdata) n.mat[n.mat[,5] == 1, 2:4] <- .25 n.mat <- n.mat[,-5] n.prediction <- n.mat %*% coef(lm1)

bansal98 · Answer 2 · 2014-04-24T18:08:08+0000

Here is what you could do:

Using rbind, group training and test data sets.
Sort the predictor.
Split the stack back into training and test data sets.

Thus, all levels will be present in both datasets.

Manually set the coefficient for a new factor level in forecasting

More articles: