Solved – Why are the residuals in this model so linearly skewed

I am fitting a simple linear regression model with 4 predictors:

lm(Outcome ~ Predictor1 + Predictor2 + Predictor3 + Predictor4, data=dat.s)

I'm finding that the model predictions are consistently off as shown in this graph:
scatterplot of predictions and residuals

The model clearly overestimates the low values and underestimates the high values, but the miss-estimation is very linear — it seems like the model should be able to just adjust the slope and fit the data better. Why is that not happening? In case it helps, here are scatterplots of the the Outcome against each of the four Predictors:
enter image description here

Using the car package outlierTest function did not identify any outliers.

You shouldn't plot the residuals against the predicator values because they are correlated, instead we plot against the fitted values, i.e. $hat Y$. To see this, consider this simple data set:

x = runif(20) y = rnorm(20) 

Clearly $x$ and $y$ are unrelated. Now, we fitted a simple linear regression model

m = lm(y ~ x) 

and plot the residuals against outcome and fitted values

plot(y, residuals(m)) plot(fitted.values(m), residuals(m)) 

to get:

enter image description here

Notice the plotting against the outcome $y$, doesn't give any insight. Hence, it's not really clear if there is something wrong with your model since we don't expect random scatter.

Similar Posts:

Rate this post

Leave a Comment