I was trying to determine the biasing constant in ridge regression when I came across a phenomenon that seems quite puzzling, to me at least. I let the GCV criterion choose a constant for me and then I got the Variance Inflation Factors of the new model by computing
$$ left( mathbf{R_{XX}} +cmathbf{I} right)^{-1} mathbf{R_{XX}} left( mathbf{R_{XX}} +cmathbf{I} right)^{-1} $$
and extracting the diagonal elements of this matrix. What I found puzzling was the fact that these VIFs were very close to zero. It seems to me that that would require negative $R^2$s, no? I know that this can happen occasionally, for example in Regression Through the Origin, but I cannot quite justify it in this context.
I am wondering then, what does a VIF close to zero mean? Then, would my choice of this constant be acceptable or should I look for another solution that keeps the VIFs close to 1, as they ought to be in the absence of multicollinearity?
Best Answer
I would like to suggest that you calculate the diagonal elements of matrix directly.
It is assumed that the design matrix is centered and scaled.
We can adopt the eigen value decomposition $R_{XX}=X'X=TLambda T'$.
$begin{align} (R_{XX}+cI)^{-1}R_{XX}(R_{XX}+cI)^{-1}&=(R_{XX}+cI)^{-1}(R_{XX}+cI)(R_{XX}+cI)^{-1}-c(R_{XX}+cI)^{-1}(R_{XX}+cI)^{-1}\ &=(R_{XX}+cI)^{-1}-c(R_{XX}+cI)^{-1}(R_{XX}+cI)^{-1} \ &=(TLambda T'+cTT')^{-1}-c(TLambda T'+cTT')^{-1}(TLambda T'+cTT')^{-1}\ &=Tleft( (Lambda+cI)^{-1}-c (Lambda+cI)^{-1} (Lambda+cI)^{-1} right)T' end{align}$
The matrix $ (Lambda+cI)^{-1}$ is a diagonal matrix that its $i$th element is $frac{1}{lambda_i+c}$.
So the matrix $(Lambda+cI)^{-1}-c (Lambda+cI)^{-1} (Lambda+cI)^{-1}$ is also a diagonal matrix and its ith element is $frac{lambda_i}{(lambda_i+c)^2}$.
In OLS, it is known that vif values are the diagonal elements of the matrix $TLambda^{-1}T'$. Comparing this $Lambda^{-1}$ matrix with the corresponding of ridge$(Lambda+cI)^{-1}-c (Lambda+cI)^{-1} (Lambda+cI)^{-1}$, every diagonal elements of the ridge case are deflated by the factor $frac{lambda_i^2}{(lambda_i+c)^2}$.
I guess now we can conclude the bigger the ridge constant, we would get the more deflated VIFs.
I am not a native English speaker. Please don't mind my awkward expressions and it would be nice of you if you correct my grammar errors. Thank you.
Similar Posts:
- Solved – Confused by MATLAB’s implementation of ridge
- Solved – PRESS statistic for ridge regression
- Solved – the moment generating function of the generalized (multivariate) chi-square distribution
- Solved – How to calculate the inverse of sum of a Kronecker product and a diagonal matrix
- Solved – How to get the Box-Cox log likelihood using the Jacobian