# Solved – Distribution of Y influenced by predictor X in simple linear regression

One of the assumptions in simple linear regression is that the error term is supposed to be normally distributed. Now, I found on the internet the following quote:

"You’ll notice there is nothing similar about Y. ε’s distribution is influenced by Y’s, which is why Y has to be continuous, unbounded, and measured on an interval or ratio scale.

But Y’s distribution is also influenced by the X’s. ε’s isn’t. That’s why you can get a normal distribution for ε, but lopsided, chunky, or just plain weird-looking Y."

My question is: is this true? I actually thought it was the other way around, that the distribution of Y is NOT influenced by the predictors.

Contents

The quote the OP links to starts with a mistake by referring to the "residuals" while all these assumptions refer to the errors (the residuals are the estimated errors).

Apart from that when we specify a regression equation, we state that as a variable, \$Y\$ is a function of \$X\$'s and the error term. It is then natural to say that the distribution of \$Y\$ will be influenced by the distribution of \$X\$'s and of the error term, since they determine \$Y\$ itself.

As a simple example, assume that \$Y = a + bX + u\$, where \$u\$ follows a Normal but \$X\$ follows say, a Gamma distribution, then the distribution of \$Y\$ cannot be normal, and what it will be will depend on the distribution of \$X\$ also, and how it "mingles" with the distribution of \$u\$. Etc.

Even if the regressors are "deterministic", meaning that they cannot be said to follow a statistical distribution, they still affect the parameters of the distribution of \$Y\$: in the previous example with deterministic regressors, the distribution of \$Y\$ will be normal with modified mean (but same variance).

In the "conditional expectation function" approach, in principle we consider the joint distribution of \${Y,X}\$ and the resulting conditional one, and the distribution of the conditional expectation function error springs from these (i.e. here the error is not treated as a separate variable but is defined as \$uequiv Y- E(Ymid X)\$ )

So in all cases, the distribution of \$Y\$ is influenced by \$X\$, in one way or the other.

Rate this post