I recently ran a regression of the following form:
mod <- lm(log(y) ~ log(x))
To examine how y scales as a function of x. I then examined the top 20 (super-linear scaling with respect to predicted trendline) and bottom 20 residuals (sub-linear scaling with respect to predicted trendline), and noticed that all these observations shared startling regularity in a third variable, z
. (all the positive residuals have very large values of z
and all the negative residuals have very small values of z.
)
I want to be able to demonstrate that this is a meaningful, statistically significant pattern. My intuition was to run
lm(z ~ resid(mod))
but this strikes me as wrong. Is there a way to capture this pattern using residuals, or is this the wrong way of thinking about it altogether?
Re: Heteroskedastic Standard Errors
Here is a QQ plot:
Residuals vs. Fitted Values
The second graph in particular is somewhat alarming. Should I be worried about heteroskedastic standard errors with a huge dataset (300,000 obs), if I'm using robust SEs
Best Answer
Well, you want to understand how the residuals could depend on z
. So the model you should look at would be
lm( resid(mod) ~ z )
but maybe first (show us) the corresponding plot. But, maybe what you see is that the spread of the residuals depend on z
not its mean (heteroskedasticity). Then try a model like
lm( Id(resid(mod)^2) ~ z )
(or replace the square with absolute value.) If this turns out in the confirmative, maybe try a more complex model, a gamlss
which permits simultaneous estimation of mean and variance functions. You could look through gamlss for some examples.
Similar Posts:
- Solved – Residuals from glm model with log link function
- Solved – Is $R^2$ valid in a nonlinear model
- Solved – Binomial GLMM: Model validation & ceiling effect
- Solved – Patterns in residuals plot from linear regression: do they tell us what model to use
- Solved – Use predicted values with or without random part to plot Residuals with binnedplot of a logistic regression in glmer (lme4 package) in R