I know some well-known measures are $c$ statistic, Kolmogorov-Smirnov $D$ statistic. However, as far as I know, those statistics take into account only of the rank order of the observations, and is invariant under changing the intercept of the logistic regression model (e.g. in oversampling-correction exercise).
In my current application, I need to depend on the accuracy of the logistic regression to predict probability of event. I know only of qualitative way of assessing models for probability prediction ability, namely by plotting "QQ-plot" of the actual vs predicted probability of event:
- Score the validation dataset using the developed model.
- Rank the observations according to the predicted probability and group into $n$ buckets according to their rank of predicted probability. (First 1/n would go to the first bucket, next 1/n would go to the next …)
- Calculate the average predicted and actual probability of Event for each bucket.
- Create a scatter plot of Predicted vs Actual – one point for each bucket.
I am wondering:
- Is the "Q-Q plot" I mentioned above a legitimate way to assess predictive performance of models developed from logistic regression? If so, where may I find more reference for that?
- Is there any known quantitative way to assess the probability prediction ability of this kind of model?
There are many good ways to do it. Here are some examples. These methods are implemented in the R
rms package (functions
- loess nonparametric full-resolution calibration curve (no binning)
- Spiegelhalter's test
- Brier score (a proper accuracy score – quadratic score)
- Generalized $R^2$ (a proper accuracy score related to deviance)
- Calibration slope and intercept
For comparing two models with regard to discrimination, the likelihood ratio $chi^2$ test is the gold standard.
Four of the above approaches, and other approaches, are covered in the 2nd edition of my book Regression Modeling Strategies (coming in 2015-09) and in my course notes that go along with the book, available from the handouts link at https://biostat.app.vumc.org/wiki/Main/RmS .
The Brier score can be decomposed into discrimination and calibration components. Along with the Brier score and Spiegelhalter's test, the nonparametric calibration curve can detect errors in the intercept.
- Solved – How to determine if the predicted probabilities from sklearn logistic regresssion are accurate
- Solved – Is a max Brier score really a max Brier score
- Solved – Does Regularized Logistic Regression Produce Calibrated Results
- Solved – How to choose optimal bin width while calibrating probability models
- Solved – How to root mean square error be used to predict logistic regression model accuracy