Solved – How to visualize Bayesian goodness of fit for logistic regression

For a Bayesian logistic regression problem, I have created a posterior predictive distribution. I sample from the predictive distribution and receive thousands of samples of (0,1) for each observation I have. Visualizing the goodness-of-fit is less than interesting, for example:

enter image description here

This plot shows the 10 000 samples + the observed datum point (way in the left one can make out a red line: yea that's the observation). The problem is is that this plot is hardly informative, and I'll have 23 of them, one for each data point.

Is there a better way to visualize the 23 data points plus there posterior samples.

Another attempt:

enter image description here

Another attempt based on the paper here

enter image description here

I have a feeling your not quite giving up all the goods to your situation, but given what we have in front of us lets consider the utility of a simple dot-plot to display the information.

Dot Plot

The only real thing to not here (that aren't perhaps default behaviors) are:

  • I utilized redundant encodings, shape and color, to discriminate between the observed values of no defects and defects. With such simple information, placing a dot on the graph is not necessary. Also you have a problem when the point is near the middle values, it takes more look-up to see if the observed value is either zero or one.
  • I sorted the graphic according to observed proportion.

Sorting is the real kicker for dot-plots like these. Sorting by values of proportion here helps easily uncover high residual observations. Having a system where you can easily sort by values either contained in the plot or in external characteristics of the cases is the best way to get the bang for your buck.

This advice extends to continuous observations as well. You could color/shape the points according to whether the residual is negative or positive, and then size the point according to the absolute (or squared) residual. This is IMO not necessary here though because of the simplicity of the observed values.

Similar Posts:

Rate this post

Leave a Comment