I am observing the following QQ plot produced from an OLS linear regression fit of my data:
Many other SE questions discussion QQ plot interpretation, but this is an extremely regular (but non-linear) patttern that I'm not sure how to interpret. To me this suggests that the linear mean function poorly estimates the response, but what can I learn from this QQ plot? (Perhaps it suggests the data were generated from a beta distribution?)
The residuals seem to follow a Gaussian distribution, and the fitted plot seems pretty okay (although I don't know how to check for equal variance).
Any help with interpretation of these results would be greatly appreciated. If it helps, the outcome is a text sentiment score in the range (-2, 2).
Edit: A histogram of the residuals. A one-sample Kolmogorov-Smirnov test (ks.test(resid(md), y=pnorm)
) leads me to reject the null hypothesis that the residuals are normally distributed.
Best Answer
The "flatter" part of a QQ plot suggests that from corresponding normal scores on the X-axis where it is flat, you have more data than would be expected according to a normal probability model. These Z-scores are (low) to -2, -1 to 1, and 2 to (high). For instance, on a normal curve, you'd expect 66% of data to lie within 1 SD of the mean. However, in your residual distribution, you have far more than 66% in that interval. Projecting the curves value at X=-1 and X=1 seems to give a Y of about -.33 to 0.33. That means that the central $pm$ 0.33 SD of the residual distribution holds 66% of the data, a much higher concentration than in a normal distribution.
Similarly, for the steeply sloped (greater than identity, or the 45 degree line) sections of the QQ-plot, you have fewer observations than would be expected by a normal probability model. That seems to match the residuals histogram you show. It looks like a mixture of platykurtic and leptokurtic distributions. As noted in the comments, a trimodal distribution seems to fit the ticket as well.