If I want to compare the goodness-of-fit of two regression models, with and without intercept, is it valid to compare the squared correlation coefficient between the fitted values and the data? Since the squared correlation would get back the $R^2$ for the model with intercept, it seems to make sense to compute the squared correlation for the model without intercept and use that to make comparison. I am not entirely sure whether this is legitimate. Am I missing anything? If this approach is not valid, what methods can I use to make such comparisons?
Best Answer
$R^2 = 1-SS_{residuals}/SS_{total}$, where "SS" is "sum of squares".
In the model with intercept included, $SS_{total}$ is the sum of squares about the dependent variable's mean. In the model with intercept suppressed, $SS_{total}$ is the sum of squares about 0, i.e. the sum of squares in the non-centered dependent variable. Therefore, one cannot directly compare the two R-squares.
Update. Well, now for the question. It sounds this: when there is intercept, R-square is equal to the squared correlation between the predicted values and the observed values; when there is no intercept, it is not: the mentioned squared correlation is another thing than R-square. Can we use this correlation in place of R-square to compare a model with intercept with a model without intercept?
If both models are the same (same set of IVs) except for the intercept, both correlations "predicted with observed" will be the same because in both cases the predicted values are just linear combinations of the same terms (only coefficients being different). The correlations are equal despite the fact that no-intercept model a priori fits worse than intercept model. So the answer to the question is no, we shouldn't use that correlation.
Similar Posts:
- Solved – When forcing intercept to zero, how R-squared is changed?
- Solved – When forcing intercept to zero, how R-squared is changed?
- Solved – the main difference between multiple R-squared and correlation coefficient
- Solved – the main difference between multiple R-squared and correlation coefficient
- Solved – ignore the negative R-squared value when I am using instrumental variable regression