Get the standardized residuals and the Residual vs Fitted graph for the mlm object from `lm ()`

Question

Get the standardized residuals and the Residual vs Fitted graph for the mlm object from `lm ()`

set.seed(0) ## 2 response of 10 observations each response <- matrix(rnorm(20), 10, 2) ## 3 covariates with 10 observations each predictors <- matrix(rnorm(30), 10, 3) fit <- lm(response ~ predictors)

I generated residual graphs for the entire model using:

 plot(fitted(fit),residuals(fit))

However, I would like to make separate graphs for each predictor covariator. I can do them one at a time for:

 f <- fitted(fit) r <- residual(fit) plot(f[,1],r[,1])

However, the problem with this approach is that it must be generalized for data sets with a large number of covariances of predictors. Is there a way that I use the graph when repeating through each (f) and (r) column? Or is there a way that plot() can group each covariance color by color?

-1

r regression linear-regression lm mlm

Martin forde 18 sept. '16 at 21:00

source share

2 answers

This is how I solved it.

 for(i in 1:ncol(f)) { plot(f[,i],r[,i]) }

Blurred mind.

-1

Martin forde 18 sept. '16 at 22:00

source share

李哲源 · Accepted Answer · 2016-09-29T10:42:13+0000

Make sure you use standardized leftovers, not raw leftovers

I often see plot(fitted(fit), residuals(fit)) , but this is statistically incorrect. We use plot(fit) to generate a diagnostic graph because we need standardized residuals, not raw ones. Read ?plot.lm for more information. But the plot method for "mlm" is poorly supported:

 plot(fit) # Error: 'plot.mlm' is not implemented yet

Define method "rstandard" S3 for "mlm"

plot.mlm not supported for many reasons, one of which is the lack of rstandard.mlm . For classes "lm" and "glm", there is a common S3 "rstandard" method for obtaining standardized residues:

 methods(rstandard) # [1] rstandard.glm* rstandard.lm*

Support "mlm" is not supported. Therefore, we will first fill this gap.

It is easy to obtain standardized residues. Let hii be the diagonal of the hat matrix, the point estimate standard error for the remainder sqrt(1 - hii) * sigma , where sigma = sqrt(RSS / df.residual) is the calculated residual standard error. RSS - residual sum of squares; df.residual - residual degree of freedom.

hii can be calculated from the matrix coefficient Q QR factorization of the model matrix: rowSums(Q ^ 2) . For "mlm" there is only one QR decomposition, since the model matrix is the same for all answers, so we only need to calculate hii once.

Different responses have different sigma , but they are elegantly colSums(residuals(fit) ^ 2) / df.residual(fit) .

Now, close these ideas to get our own rstandard method for mlm:

 ## define our own "rstandard" method for "mlm" class rstandard.mlm <- function (model) { Q <- with(model, qr.qy(qr, diag(1, nrow = nrow(qr$qr), ncol = qr$rank))) ## Q matrix hii <- rowSums(Q ^ 2) ## diagonal of hat matrix QQ' RSS <- colSums(model$residuals ^ 2) ## residual sums of squares (for each model) sigma <- sqrt(RSS / model$df.residual) ## ## Pearson estimate of residuals (for each model) pointwise_sd <- outer(sqrt(1 - hii), sigma) ## point-wise residual standard error (for each model) model$residuals / pointwise_sd ## standardised residuals }

Note the use of .mlm in the function name to tell R, this is the S3 method associated with it. Once we have defined this function, we can see it in the "rstandard" method:

 ## now there are method for "mlm" methods(rstandard) # [1] rstandard.glm* rstandard.lm* rstandard.mlm

To call this function, we do not need to explicitly call rstandard.mlm ; just call rstandard :

 ## test with your fitted model `fit` rstandard(fit) # [,1] [,2] #1 1.56221865 2.6593505 #2 -0.98791320 -1.9344546 #3 0.06042529 -0.4858276 #4 0.18713629 2.9814135 #5 0.11277397 1.4336484 #6 -0.74289985 -2.4452868 #7 0.03690363 0.7015916 #8 -1.58940448 -1.2850961 #9 0.38504435 1.3907223 #10 1.34618139 -1.5900891

The standardized N(0, 1) residues are distributed.

Retrieving balances vs established plot for "mlm"

Your initial attempt:

 f <- fitted(fit); r <- rstandard(fit); plot(f, r)

- A good idea, provided that the points for different models can be identified from each other. Therefore, we can try to use different dot colors for different models:

 plot(f, r, col = as.numeric(col(f)), pch = 19)

Graphic arguments such as col , pch and cex can accept vector input. I ask plot use col = j for r[,j] ~ f[,j] , where j = 1, 2,..., ncol(f) . Read the "Color Specification" ?par , which means col = j . pch = 19 tells plot to draw solid points. Read the basic graphic options for the various options.

Finally, you may need a legend. You can do

 plot(f, r, col = as.numeric(col(f)), pch = 19, ylim = c(-3, 4)) legend("topleft", legend = paste0("response ", 1:ncol(f)), pch = 19, col = 1:ncol(f), text.col = 1:ncol(f))

To leave room for the legend window, expand ylim . Since the standardized residues are N(0,1) , ylim = c(-3, 3) is a good range. If we want to place the legend window in the upper left corner, we expand ylim to c(-3, 4) ylim c(-3, 4) . You can customize your legend even more with ncol , title , etc.

How many answers do you have?

If you have no more than a few answers, the sentence above works well. If you have a lot, they can be decomposed into a separate area. A for , as you found out, is decent, except that you need to divide the plotting area into different subheadings, possibly using par(mfrow = c(?, ?)) . Also set the inner edge of mar and the outer edge of oma if you take this approach. You can read How to create a more pleasant plot for my categorical time series in the matrix? for example.

If you have even more answers, you might need a mixture of both? Say, if you have 42 answers, you can do par(mfrow = c(2, 3)) , and then build 7 answers in each sub-figure. Now the decision is more based on opinions.

Get the standardized residuals and the Residual vs Fitted graph for the mlm object from `lm ()`

More articles: