Get Hat Matrix from QR Decomposition for Weighted Least Square Regression

I am trying to extend the lwr() function of the lwr() package, which corresponds to nested regressions as a nonparametric estimate. At the core of the lwr() function, it inverts the matrix using solve() instead of QR decomposition, which leads to numerical instability. I would like to change it, but I can’t understand how to get the hat matrix (or other derivatives) after QR decomposition.

With data:

 set.seed(0); xmat <- matrix(rnorm(500), nrow=50) ## model matrix y <- rowSums(rep(2:11,each=50)*xmat) ## arbitrary values to let `lm.wfit` work w <- runif(50, 1, 2) ## weights 

The lwr() function is as follows:

 xmat2 <- w * xmat xx <- solve(crossprod(xmat, xmat2)) xmat1 <- tcrossprod(xx, xmat2) vmat <- tcrossprod(xmat1) 

I need a value, for example:

 sum((xmat[1,] %*% xmat1)^2) sqrt(diag(vmat)) 

At the moment I am using reg <- lm.wfit(x=xmat, y=y, w=w) , but I am not able to return what seems to me to be the hat matrix ( xmat1 ).

+3
r regression linear-regression lm
source share
1 answer

This old question is a continuation of another old question that I just answered: Calculate the projection / hat matrix using QR factorization, SVD (and Cholesky factorization?) . This answer discusses 3 options for computing the matrix of matrices for the ordinary least squares problem, while this question is in the context of weighted least squares. But the result and method in this answer will be the basis of my answer here. In particular, I will demonstrate only the QR approach.

enter image description here

The OP mentioned that we can use lm.wfit to calculate QR factorization, but we could do it using qr.default ourselves, which I will show.


Before continuing, I need to point out that the OP code is not doing what it thinks. xmat1 not a hat matrix; instead xmat %*% xmat1 is. vmat not a hat matrix, although I don't know what it really is. Then I do not understand what it is:

 sum((xmat[1,] %*% xmat1)^2) sqrt(diag(vmat)) 

The second looks like the diagonal of the hat matrix, but, as I said, vmat not a hat matrix. Well, in any case, I will continue the correct calculation for the hat matrix and show how to get its diagonal and tracing.


Consider a toy model matrix X and some uniform positive weights w :

 set.seed(0); X <- matrix(rnorm(500), nrow = 50) w <- runif(50, 1, 2) ## weights must be positive rw <- sqrt(w) ## square root of weights 

First we get X1 (X_tilde in the latex paragraph) by scaling the string to X :

 X1 <- rw * X 

Then we perform QR factorization to X1 . As discussed in my related answer, we can do this factorization with or without column rotation. lm.fit or lm.wfit , so lm does not perform the rotation, but here I will use rotary factorization as a demonstration.

 QR <- qr.default(X1, LAPACK = TRUE) Q <- qr.qy(QR, diag(1, nrow = nrow(QR$qr), ncol = QR$rank)) 

Note that we did not calculate tcrossprod(Q) , as in the linked answer, because this is for ordinary least squares. For weighted least squares, we want Q1 and Q2 :

 Q1 <- (1 / rw) * Q Q2 <- rw * Q 

If we need only the diagonal and trace of the hat matrix, there is no need to do matrix multiplication in order to first obtain the full hat matrix. We can use

 d <- rowSums(Q1 * Q2) ## diagonal # [1] 0.20597777 0.26700833 0.30503459 0.30633288 0.22246789 0.27171651 # [7] 0.06649743 0.20170817 0.16522568 0.39758645 0.17464352 0.16496177 #[13] 0.34872929 0.20523690 0.22071444 0.24328554 0.32374295 0.17190937 #[19] 0.12124379 0.18590593 0.13227048 0.10935003 0.09495233 0.08295841 #[25] 0.22041164 0.18057077 0.24191875 0.26059064 0.16263735 0.24078776 #[31] 0.29575555 0.16053372 0.11833039 0.08597747 0.14431659 0.21979791 #[37] 0.16392561 0.26856497 0.26675058 0.13254903 0.26514759 0.18343306 #[43] 0.20267675 0.12329997 0.30294287 0.18650840 0.17514183 0.21875637 #[49] 0.05702440 0.13218959 edf <- sum(d) ## trace, sum of diagonals # [1] 10 

In linear regression, d is the influence of each data item, and it is useful for creating confidence intervals (using sqrt(d) ) and standardized residuals (using sqrt(1 - d) ). Tracing is the effective number of parameters or the effective degree of freedom for the model (so I call it edf ). We see that edf = 10 because we used 10 parameters: X has 10 columns and has no rank.

Usually d and edf are all we need. In rare cases, we need a full hat matrix. To get it, we need an expensive matrix multiplication:

 H <- tcrossprod(Q1, Q2) 

The Hat Matrix is ​​especially helpful in helping us understand if a model is local or rare. Let me build this matrix (read ?image for detailed information and examples on how to build the matrix in the correct orientation):

 image(t(H)[ncol(H):1,]) 

enter image description here

We see that this matrix is completely dense . This means that the prediction at each anchor point depends on all the data, i.e. The prediction is not local. Although, if we compare with other nonparametric prediction methods, such as core regression, loess, P-spline (fined B-spline regression) and wavelet, we will observe a sparse hat matrix. Therefore, these methods are known as local fittings.

+2
source share

All Articles