When evaluating a regression model with cross-validation I thought that the meaningful measure would be MSE divided by the MSE of the null model which consists of always predicting the mean,

$frac{hat E[(y-hat{y})^2]}{hat E[(y-bar{y})^2]}$. This is 1 if the model does not add anything, and 0 if the prediction is perfect (and can even be greater than 1 if the model is actively harmful), To make it more interpretable, I can flip it around:

$1 – frac{hat E[(y-hat{y})^2]}{hat E[(y-bar{y})^2]}$.

This can even be negative if the prediction is worse than the null model.

I have seen people use

$1 – frac{rm{var}(y-hat{y})}{rm{var}(y)}$

and calling it explained variance, but this seems too generous for the model as it does not penalize it for additive or multiplicative biases.

What is the measure I have above called? Is there a reason why it is not used or have I just missed the relevant examples?

**Contents**hide

#### Best Answer

Thanks to Mario Figueiredo who commented on my blog, I found out the answer: This measure is called $R^2_{CV}$ or $Q^2$.

*Reference*: The Prediction Sum of Squares as a General Measure for Regression Diagnostics Nguyen T. Quan Journal of Business & Economic Statistics Vol. 6, No. 4 (Oct., 1988), pp. 501-504 JSTOR

### Similar Posts:

- Solved – How to use k-fold cross-validation to determine whether a linear regression model performs significantly better than chance
- Solved – How to evaluate stacking ensemble model vs. other models with 10-fold cross-validation
- Solved – How does k-fold cross validation overcome overfitting in deep neural networks?
- Solved – How does k-fold cross validation overcome overfitting in deep neural networks?
- Solved – Choosing inner cross validation strategy for modeling time series data