Solved – How to interpret the basics of a logistic regression calibration plot please

First-time post to Stack Overflow. I am asking for help interpreting the calibration plot of a logistic regression model please. I understand what the ideal line means, but not the bias-corrected or the apparent lines please. Also what do the ticks on the top x-axis mean?

I have read through the documentation on rms::calibrate() but need something of a lower level.

calibration.Model.A <- plot(rms::calibrate(m.A, cmethod=("boot"), B=1000,                                            legend = TRUE, digits = 3,                                            subtitles = T),                             xlab = "Predicted probability according to model",                             ylab = "Observation Proportion of Matching") 

Calibration plot I created but do not understand.

The ticks across the x-axis represent the frequency distribution (may be called a rug plot) of the predicted probabilities. This is a way to see where there is sparsity in your predictions and where there is a relative abundance of predictions in a given area of predicted probabilities.

The "Apparent" line is essentially the in-sample calibration.

The "Ideal" line represents perfect prediction as the predicted probabilities equal the observed probabilities.

The "Bias Corrected" line is derived via a resampling procedure to help add "uncertainty" to the calibration plot to get an idea of how this might perform "out-of-sample" and adjusts for "optimistic" (better than actual) calibration that is really an artifact of fitting a model to the data at hand. This is the line we want to look at to get an idea about generalization (until we have new data to try the model on).

When either of the two lines is above the "Ideal" line, this tells us the model underpredicts in that range of predicted probabilities. When either line is below the "Ideal" line, the model overpredicts in that range of predicted probabilities.

Applying to your specific plot, it appears most of the predicted probabilities are in the higher end (per rug plot). The model overall appears to be reasonably well calibrated based on the Bias-Corrected line closely following the Ideal line; there is some underprediction at lower predicted probabilities because the Bias-Corrected line is above the Ideal line around < 0.3 predicted probability.

The mean absolute error is the "average" absolute difference (disregard a positive or negative error) between predicted probability and actual probability. Ideally, we want this to be small (0 would be perfect indicating no error). This seems small in your plot, but may be situation dependent on how small is small. The other measure that Frank Harrell's program returns is the 90th percentile absolute error (90% of the errors are smaller than this number); this should be looked at as well.

Similar Posts:

Rate this post

Leave a Comment