I am fitting a simple linear regression model with 4 predictors:
lm(Outcome ~ Predictor1 + Predictor2 + Predictor3 + Predictor4, data=dat.s)
I'm finding that the model predictions are consistently off as shown in this graph:
The model clearly overestimates the low values and underestimates the high values, but the miss-estimation is very linear — it seems like the model should be able to just adjust the slope and fit the data better. Why is that not happening? In case it helps, here are scatterplots of the the Outcome against each of the four Predictors:
Using the car
package outlierTest
function did not identify any outliers.
Best Answer
You shouldn't plot the residuals against the predicator values because they are correlated, instead we plot against the fitted values, i.e. $hat Y$. To see this, consider this simple data set:
x = runif(20) y = rnorm(20)
Clearly $x$ and $y$ are unrelated. Now, we fitted a simple linear regression model
m = lm(y ~ x)
and plot the residuals against outcome and fitted values
plot(y, residuals(m)) plot(fitted.values(m), residuals(m))
to get:
Notice the plotting against the outcome $y$, doesn't give any insight. Hence, it's not really clear if there is something wrong with your model since we don't expect random scatter.
Similar Posts:
- Solved – How to have non-random patterns in the plot of simple linear regression residuals vs the predictor variable
- Solved – How to interpret this Residuals vs Fitted plot for multiple regression
- Solved – Residuals from glm model with log link function
- Solved – Are the model residuals well-behaved (homoscedasticity)
- Solved – Finding the fitted and predicted values for a statistical model