When is the linear regression estimate of $beta_1$ in the model

$$ Y= X_1beta_1 + delta$$

unbiased, given that the $(x,y)$ pairs are generated with the following model?

$$ Y= X_1beta_1 + X_2beta_2 + delta$$

We have that the expected value of $beta_1$ is

begin{align*}

E[hat{beta}_1|X_1,X_2) &= E[(X_1^TX_1)^{-1}X_1^T(X_1beta_1+X_2beta_2+delta)|X_1,X_2]\

&=beta_1 + E[(X_1^TX_1)^{-1}X_1^TX_2beta_2+(X_1^TX_1)^{-1}X_1^Tdelta|X1,X2]\

&= beta_1+E[(X_1^TX_1)^{-1}X_1^TX_2beta_2 | X_1,X_2] + 0\

end{align*}

Now, **when is the second term 0 (i.e., $hat{beta}_1$ is an unbiased estimator)?** I have read that it is 0 if $X_1$ and $X_2$ are independent.

But which property allows me to conclude that?

**Contents**hide

#### Best Answer

It is zero when the columns of $mathbf{X}_1$ are perpendicular to the columns of $mathbf{X}_2$ so that the column spaces are orthogonal to one another. This means that the variables need to be *uncorrelated* to one another, which is not quite the same thing as independence. It is in fact a weaker condition as independence *implies* zero correlation.

If the variables are not uncorrelated, however, and you proceed to estimate $boldsymbol{beta}_1$ only, you will end up with a biased estimator. In fact, this is called omitted variable bias.