Manually set the coefficient for a new factor level in forecasting

I have a linear model where one of the independent variables is a factor and where I am trying to make predictions in a dataset that contains a new factor level (the level of a factor that was not in the dataset that was estimated). I want to be able to make predictions for observations with a new factor level, manually specifying the coefficient that will be applied to the factor. For example, suppose I estimate daily sales for three types of stores, and I enter the fourth type of store into the data set. I do not have historical data, but I can assume that it will behave as some balanced combination of other stores for which I have model coefficients.

If I try to apply predict.lm() to the new data, I will get an error telling me that this factor has new levels (it makes sense).

 df <- data.frame(y=rnorm(100), x1=factor(rep(1:4,25))) lm1 <- lm(y ~ x1, data=df) newdata <- data.frame(y=rnorm(100), x1=factor(rep(1:5,20))) predict(lm1, newdata) Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor x2 has new levels 5 

I could do the prediction manually by simply multiplying the odds by the individual columns in the data.frame. However, this is cumbersome, given that in the real model I am working with, there are many variables and terms of interaction, and I want to be able to easily cycle through various model specifications, changing the model formula. Is there a way for me to significantly add a new coefficient to the model object, and then use it to make forecasts? If not, is there another approach that is less cumbersome than setting the entire forecasting step manually?

+7
r
source share
2 answers

If you want level 5 to be evenly weighted, you can convert it to a matrix, connect 25% and multiply it by the coefficients from the model ...

 n.mat <- model.matrix(~x1, data=newdata) n.mat[n.mat[,5] == 1, 2:4] <- .25 n.mat <- n.mat[,-5] n.prediction <- n.mat %*% coef(lm1) 
+1
source share

Here is what you could do:

  • Using rbind, group training and test data sets.
  • Sort the predictor.
  • Split the stack back into training and test data sets.

Thus, all levels will be present in both datasets.

0
source share

All Articles