One of the assumptions in simple linear regression is that the error term is supposed to be normally distributed. Now, I found on the internet the following quote:
"You’ll notice there is nothing similar about Y. ε’s distribution is influenced by Y’s, which is why Y has to be continuous, unbounded, and measured on an interval or ratio scale.
But Y’s distribution is also influenced by the X’s. ε’s isn’t. That’s why you can get a normal distribution for ε, but lopsided, chunky, or just plain weird-looking Y."
My question is: is this true? I actually thought it was the other way around, that the distribution of Y is NOT influenced by the predictors.
Best Answer
The quote the OP links to starts with a mistake by referring to the "residuals" while all these assumptions refer to the errors (the residuals are the estimated errors).
Apart from that when we specify a regression equation, we state that as a variable, $Y$ is a function of $X$'s and the error term. It is then natural to say that the distribution of $Y$ will be influenced by the distribution of $X$'s and of the error term, since they determine $Y$ itself.
As a simple example, assume that $Y = a + bX + u$, where $u$ follows a Normal but $X$ follows say, a Gamma distribution, then the distribution of $Y$ cannot be normal, and what it will be will depend on the distribution of $X$ also, and how it "mingles" with the distribution of $u$. Etc.
Even if the regressors are "deterministic", meaning that they cannot be said to follow a statistical distribution, they still affect the parameters of the distribution of $Y$: in the previous example with deterministic regressors, the distribution of $Y$ will be normal with modified mean (but same variance).
In the "conditional expectation function" approach, in principle we consider the joint distribution of ${Y,X}$ and the resulting conditional one, and the distribution of the conditional expectation function error springs from these (i.e. here the error is not treated as a separate variable but is defined as $uequiv Y- E(Ymid X)$ )
So in all cases, the distribution of $Y$ is influenced by $X$, in one way or the other.
Similar Posts:
- Solved – What violates the assumptions of regression analysis?
- Solved – What violates the assumptions of regression analysis?
- Solved – Regression: why test normality of overall residuals, instead of residuals conditional on $hat{y}$
- Solved – Why can we assume normally distributed errors in probit but not in LPM
- Solved – Does the assumption of Normal errors imply that Y is also Normal