I know some well-known measures are $c$ statistic, Kolmogorov-Smirnov $D$ statistic. However, as far as I know, those statistics take into account only of the rank order of the observations, and is invariant under changing the intercept of the logistic regression model (e.g. in oversampling-correction exercise).

In my current application, I need to depend on the accuracy of the logistic regression to predict **probability of event**. I know only of qualitative way of assessing models for probability prediction ability, namely by plotting "QQ-plot" of the actual vs predicted probability of event:

- Score the validation dataset using the developed model.
- Rank the observations according to the predicted probability and group into $n$ buckets according to their rank of predicted probability. (First 1/n would go to the first bucket, next 1/n would go to the next …)
- Calculate the average predicted and actual probability of Event for each bucket.
- Create a scatter plot of Predicted vs Actual – one point for each bucket.

I am wondering:

- Is the "Q-Q plot" I mentioned above a legitimate way to assess predictive performance of models developed from logistic regression? If so, where may I find more reference for that?
- Is there any known quantitative way to assess the probability prediction ability of this kind of model?

**Contents**hide

#### Best Answer

There are many good ways to do it. Here are some examples. These methods are implemented in the R `rms`

package (functions `val.prob`

, `calibrate`

, `validate`

):

- loess nonparametric full-resolution calibration curve (no binning)
- Spiegelhalter's test
- Brier score (a proper accuracy score – quadratic score)
- Generalized $R^2$ (a proper accuracy score related to deviance)
- Calibration slope and intercept

For comparing two models with regard to discrimination, the likelihood ratio $chi^2$ test is the gold standard.

Four of the above approaches, and other approaches, are covered in the 2nd edition of my book *Regression Modeling Strategies* (coming in 2015-09) and in my course notes that go along with the book, available from the handouts link at https://biostat.app.vumc.org/wiki/Main/RmS .

The Brier score can be decomposed into discrimination and calibration components. Along with the Brier score and Spiegelhalter's test, the nonparametric calibration curve can detect errors in the intercept.

### Similar Posts:

- Solved – How to determine if the predicted probabilities from sklearn logistic regresssion are accurate
- Solved – Is a max Brier score really a max Brier score
- Solved – Does Regularized Logistic Regression Produce Calibrated Results
- Solved – How to choose optimal bin width while calibrating probability models
- Solved – How to root mean square error be used to predict logistic regression model accuracy