How to manually set coefficients for variables in a linear model?

In R, how can I set weights for specific variables , rather than observations in the lm() function?

The context is as follows. I am trying to create a personal ranking system for certain products, for example, for phones. I can build a linear model based on price as a dependent variable, and other functions such as screen size, memory, OS, etc. Like independent variables. Then I can use it to predict the real value of the phone (as opposed to the declared price), thereby finding the best price / quality factor. This is what I have already done.

Now I want to highlight some functions that are important only to me. For example, I might need a phone with a large memory, so I want to give it a higher weight so that the linear model is optimized for variable memory.

Function

lm() in R has a weights parameter, but these are scales for observations, not variables (correct me if this is incorrect). I also tried to play with the formula, but received only interpreter errors. Is there a way to enable scales for variables in lm() ?

Of course, the lm() function is not the only option. If you know how to do this with other similar solutions (e.g. glm() ), this is also very good.

UPD. After a few comments, I realized that the way I thought about the problem is wrong. The linear model obtained by calling lm() gives optimal coefficients for training examples, and there is no way (and no need) to change the weight of the variables, sorry for the confusion I made. What I'm actually looking for is a way to change the coefficients in an existing linear model to manually make some parameters more important than others. Continuing the previous example, let's say we have the following formula for the price:

 price = 300 + 30 * memory + 56 * screen_size + 12 * os_android + 9 * os_win8 

This formula describes the best possible linear model of the dependence of price and phone parameters. However, now I want to manually change the number 30 before the memory variable, for example 60, so it will look like this:

 price = 300 + 60 * memory + 56 * screen_size + 12 * os_android + 9 * os_win8 

Of course, this formula no longer reflects the optimal relationship between price and phone settings. Also, the dependent variable does not show the actual price, but only some value of good, given that memory is two times more important for me than for the average person (based on coefficients from the first formula). But this value of good (or, more precisely, the value of the goodness/price fraction) is exactly what I need - with this I can find the best (in my opinion) phone with the best price.

Hope this all makes sense. Now I have one (possibly very simple) question. How can I manually set the coefficients in an existing linear model obtained using lm() ? That is, I'm looking for something like:

 coef(model)[2] <- 60 

This code does not work, but you should get this idea. Note. Obviously, it is only possible to double the values ​​in the memory column in the data frame, but I'm looking for a more elegant solution that affects the model, not the data.

+7
source share
2 answers

The following code is a bit more complicated, because lm() minimizes the residual sum of squares and a fixed, not optimal coefficient is not minimal, so this will be against what t20> is trying to do, and the only way to fix all other coefficients is also.

To do this, you first need to know the coefficients of an unlimited model. All settings must be made by changing the formula of your model, for example. we have price ~ memory + screen_size , and of course there is a hidden interception. Now, neither changing data directly, nor using I(c*memory) is a good idea. I(c*memory) like temporarily changing data, but changing just one coefficient by converting variables would be a lot harder.

So, first change price ~ memory + screen_size to price ~ offset(c1*memory) + offset(c2*screen_size) . But we did not modify the interception, which would now try to minimize the residual sum of squares and, possibly, become different than in the original model. The last step is to remove the interception and add a new fake variable, that is, with the same number of observations as the other variables:

price ~ offset(c1*memory) + offset(c2*screen_size) + rep(c0, length(memory)) - 1

 # Function to fix coefficients setCoeffs <- function(frml, weights, len){ el <- paste0("offset(", weights[-1], "*", unlist(strsplit(as.character(frml)[-(1:2)], " +\\+ +")), ")") el <- c(paste0("offset(rep(", weights[1], ",", len, "))"), el) as.formula(paste(as.character(frml)[2], "~", paste(el, collapse = " + "), " + -1")) } # Example data df <- data.frame(x1 = rnorm(10), x2 = rnorm(10, sd = 5), y = rnorm(10, mean = 3, sd = 10)) # Writing formula explicitly frml <- y ~ x1 + x2 # Basic model mod <- lm(frml, data = df) # Prime coefficients and any modifications. Note that "weights" contains # intercept value too weights <- mod$coef # Setting coefficient of x1. All the rest remain the same weights[2] <- 3 # Final model mod2 <- update(mod, setCoeffs(frml, weights, nrow(df))) # It is fine that mod2 returns "No coefficients" 

In addition, you are probably going to use mod2 only for forecasting (in fact, I don’t know where else it could be used now), so that this could be done in a simpler way, without setCoeffs :

 # Data for forecasting with eg price unknown df2 <- data.frame(x1 = rpois(10, 10), x2 = rpois(5, 5), y = NA) mat <- model.matrix(frml, model.frame(frml, df2, na.action = NULL)) # Forecasts rowSums(t(t(mat) * weights)) 
+4
source

It looks like you are doing the optimization, not the fitting of the model (although there may be optimization in the model). You probably want something like an optim function or look at linear or quadratic programming ( linprog and quadprog packages).

If you insist on using modeling tools such as lm , use the offset argument in the formula to specify your own multi-user, not computational.

+4
source

All Articles