I have a logit model and am trying to understand and compare the predicted and observed values generated by the model. Let's say data set had 100 values and I generate all the predicted probabilities, and then I find the actual probabilities from the data set.
If I'm comparing the predicted vs observed values, I'm thinking there are two ways to do it. One is to do it value by value, while the second would be to group by the 'predicted probabilities.'
Method 1: x_value pred_val obs_val 100 0.30 0.34 102 0.33 0.36 104 0.35 0.37 106 0.40 0.40 ...
I'm also thinking there has to some way to aggregate these values. So I'm thinking of aggregating all x values where the predicted probabilities is between 10 to 20% percent, then find the avg predicted value from that range, followed by the predicted value for that range.
Method 2: Pred_probs pred_val obs_val 10 to 20% vals 0.10 0.11 21 to 30% vals 0.12 0.16 31 to 50% vals 0.15 0.30
What I'm wondering is:
When there are a large number of data points, what use is having a list of the predicted and observed values for any given value of x?
Does it ever make sense to do something as identified in 'Method 2'?
Best Answer
It sounds as if you are wanting to check the calibration of a model on the same dataset used to build the model. This will require the use of the bootstrap to re-fit the model 300 times. You can use a bootstrap overfitting-corrected nonparametric calibration curve with a nonparametric smoother. It is not a good idea to bin predicted probabilities. Assuming you did no variable selection here's an approach in R with the rms
package.
require(rms) f <- lrm(y ~ x1 + x2 + x3, x=TRUE, y=TRUE) # Full pre-specified model validate(f, B=300) # bootstrap stats such as Somers' Dxy cal <- calibrate(f, B=300) plot(cal)
Similar Posts:
- Solved – How to choose optimal bin width while calibrating probability models
- Solved – How to estimate a calibration curve with bootstrap (R)
- Solved – How to interpret the basics of a logistic regression calibration plot please
- Solved – How to interpret the basics of a logistic regression calibration plot please
- Solved – How to get a confidence interval around the output of logistic regression