For OLS parameter estimates to be consistent it must be the case that
E(u|x)=0. Is it true?
E(u|x)=0 is a required condition for unbiasedness. But as far as I understand, unbiasedness does not necessarily mean consistency. Therefore I am really confused.
Best Answer
Ok. The model is, in matrix notation and conformable dimensions $$mathbf y = mathbf Xbeta + mathbf u $$
The $OLS$ estimator is
$$hat beta = (mathbf X'mathbf X)^{-1}mathbf X' mathbf y = (mathbf X'mathbf X)^{-1}mathbf X' (mathbf Xbeta + mathbf u) $$
$$= (mathbf X'mathbf X)^{-1}mathbf X' mathbf Xbeta + (mathbf X'mathbf X)^{-1}mathbf X'mathbf u = beta + (mathbf X'mathbf X)^{-1}mathbf X'mathbf u$$
For consistency we examine
$$operatorname{plim}hat beta = operatorname{plim}beta + operatorname{plim}left[(mathbf X'mathbf X)^{-1}mathbf X'mathbf uright] = beta + operatorname{plim}left[left(frac 1nmathbf X'mathbf Xright)^{-1}left(frac 1nmathbf X'mathbf uright)right] $$
And here is the crucial point that makes us need a weaker assumption for consistency compared to unbiasedness: for unbiasedness we would face $Eleft[(mathbf X'mathbf X)^{-1}mathbf X'mathbf uright]$, and in order to "insert" the expected value into the expression we have to condition on $mathbf X$, which leads us to the expression $E(mathbf umid mathbf X)$ and the need to assume it as being equal to zero, i.e. assume "mean-independence" between the error term and the regressors.
But $operatorname{plim}$ is a more "flexible" operator than $E$: under $operatorname{plim}$ expressions and products can be decomposed (something that under the expected value requires independence), and also $operatorname{plim}$ can "go inside the expression" (while $E$ cannot except if it is an affine function), as long as the function is a continuous transformation (and it very rarely isn't) – so
$$operatorname{plim}left[left(frac 1nmathbf X'mathbf Xright)^{-1}left(frac 1nmathbf X'mathbf uright)right] = operatorname{plim}left(frac 1nmathbf X'mathbf Xright)^{-1}operatorname{plim}left(frac 1nmathbf X'mathbf uright)$$
For consistency we need to assume that the first $operatorname{plim}$ is finite -but this is an assumption on the properties of the regressor matrix, unrelated to the error term. So we are left with the second $operatorname{plim}$ which, written for clarity using sums it is $$operatorname{plim}left(frac 1nmathbf X'mathbf uright) = left[begin{matrix} operatorname{plim}frac 1nsum_{i=1}^nx_{1i}u_i \ .\ .\ operatorname{plim}frac 1nsum_{i=1}^nx_{ki}u_i \ end{matrix}right] rightarrowleft[begin{matrix} frac 1nsum_{i=1}^nE(x_{1i}u_i) \ .\ .\ frac 1nsum_{i=1}^nE(x_{ki}u_i) \ end{matrix}right] $$ …the last transformation due to the usual assumptions that permit the application of the law of large numbers.
Exactly because we have been able to "separate" $(mathbf X'mathbf X)^{-1}$ from $mathbf X'mathbf u$ (due to the fact that we are examining the $operatorname{plim}$ and not $E$) we ended up looking only at the contemporaneous relation between each regressor and the error term. And so what we need to assume for consistency of the $OLS$ estimator is only that $E(x_{1i}u_i) =0 ; forall k, ; forall i$, (contemporaneous uncorrelatedness) which is much weaker than $E(mathbf umid mathbf X)$, the latter requiring mean-independence, and moreover, not only contemporaneous independence, but across time too (since we condition the whole error vector on the whole regressor matrix).