Solved – Do these residual plots violate the linearity and homogeneity assumptions for linear regression

There seem to be too many points clustered around negative values for all the plots
And while 3 & 4 seem to have random enough patterns, 1 & 2 seems to have negatively sloped trend.

If these were to violate linearity and homogeneity assumption, I should stop using the regression model, correct?

Residual plot variable 1

Residual plot variable 2

Residual plot variable 3

Residual Plot variable 4

Yes, the residual plots for variables 1 & 2 are problematic. I don't necessarily see any heterogeneity of variance (heteroscedasticity), or even non-linearity, but they certainly show non-independence. You can very clearly guess if a residual will be above or below 0 based on whether its neighbors are.

I do want to clear up a small misunderstanding. You state that you think there may be too many residuals below 0. It isn't that 50% of the residuals must be <0, and 50% above, rather the assumption is that the mean of the residuals is 0. If you have some skew in the distribution of the residuals, the mean won't equal the median, and you can validly have different numbers greater or less than 0.

I am perplexed, though. The OLS algorithm should ensure that what you see in your top two plots does not happen in regression. What code / program did you use to fit the data and generate these residuals? Did you force the intercept to be 0? That is the only thing I can think of that would produce the plots you show.

Similar Posts:

Rate this post

Leave a Comment