# Solved – Understanding mean independence in the regression setting

The notion of uncorrelated (\$mathbb{E}[XY]=0\$) and mean independence (\$mathbb{E}[X|Y]=0\$) are mentioned in different setting of regression assumptions. We know that \$mathbb{E}[X|Y]=0\$ implies \$mathbb{E}[XY]=0\$ (but not the other way round). Here is a specific question about the relationship between these two notions in the regression setting.

We are looking at the effect of whether go to school or not on the wage of a population. Let \$D_iin{1,0}\$ be the random variable denote whether individual \$i\$ went to school (\$D_i=1\$) or not (\$D_i=0\$). Let \$Y_i\$ be the wage of people \$i\$. Note that if we can FORCE everyone in the population go to school, then we will have a wage distribution denoted by \$Y_{1i}\$ and similarly, if we FORCE all people not go to school, we have a wage distribution denoted by \$Y_{0i}\$.

So we have \$Y_i = D_iY_{1i} + (1-D_i)Y_{0i}~~~~~~~~~~~~~~~(1)\$.

Note that we can always write \$Y_{1i} =mu_1+epsilon_{1i}\$ and \$Y_{0i} =mu_0+epsilon_{0i}\$, i.e., mean plus a noise with mean 0. Then, we substitute these 2 equations into equation (1), we have

\$Y_i=mu_0+(mu_1-mu_0)D_i+epsilon_i~~~~~~~(2)\$
where \$epsilon_i=epsilon_{0i}+D_i(epsilon_{1i}-epsilon_{0i})\$
Note that \$epsilon_i\$ has 0 mean clearly.

So equation (2) describes the real world about wage and school without making any assumptions other than the mean of \$Y_{1i}\$ and \$Y_{0i}\$ is finite.

Note that \$epsilon_i\$ will always dependent with \$D_i\$ (but they are not necessarily correlated). Now suppose \$epsilon_i\$ and \$D_i\$ are uncorrelated (first, I don't know what does this mean in practice), then we know that OLS estimator is consistent (for unbiasedness of OLS, it would require mean independence, i.e., \$mathbb{E}[epsilon_i|D_i]=0\$). So \$mu_0\$ and \$mu_1\$ is identifiable. In this case, \$epsilon_i\$ and \$D_i\$ being uncorrelated is equivalent to \$mathbb{E}[epsilon_i D_i]=0\$. I wonder if someone could explain the underlying meaning of this expression in this setting.

Note that a sufficient condition for \$mathbb{E}[epsilon_i D_i]=0\$ is that \$mathbb{E}[epsilon_i|D_i]=0\$. I can understand this expression very well, which is "given the information of \$D_i\$ is not going to change the mean of the random variable \$epsilon_i\$". Note that this is weaker than the notion of independence, since \$epsilon_i\$ independent of \$D_i\$ means that given the information of \$D_i\$, the distribution of \$epsilon_i\$ remains the same, which is much stronger than the first moment remains the same (i.e., \$mathbb{E}[epsilon_i|D_i]=0\$).

The expression \$mathbb{E}[epsilon_i|D_i]=0\$ can be explained intuitively if we look at this identification problem from a different angle, we have:

\$E[Y_i|D_i=1]-E[Y_i|D_i=0]=(mu_1-mu_0)+E[epsilon_i|D_i=1]-E[epsilon_i|D_i=0]=(mu_1-mu_0)+E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]\$.

Note that we observe \$E[Y_i|D_i=1]\$ and \$E[Y_i|D_i=0]\$ and we want to identify \$mu_1-mu_0\$, which requires \$E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]=0\$. Note that if randomly assign school or not school to people in the population, this will guarantee \$E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]=0\$ (or even if we don't have randomized assignment, but somehow, we know that \$mathbb{E}[epsilon_i|D_i]=0\$, then we are still able to make this claim).

However, if we only have \$epsilon_i\$ and \$D_i\$ are uncorrelated, i.e., \$E[epsilon_i D_i]=0\$, this will not imply \$E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]=0\$. But then it implies that by purely look at the group mean (i.e., \$E[Y_i|D_i=1]\$ and \$E[Y_i|D_i=0]\$) will not help us identify \$mu_1-mu_0\$, but run OLS will achieve this goal. Where is my logic going wrong?

Contents

The assumption here that \$epsilon_i\$ and \$D_i\$ are uncorrelated without mean independence holding is impossible when \$D_i\$ takes only two values. Intuitively, correlation measures the linear relationship between the values, so for mean independence to not hold in the presence of zero correlation, the mean \$mathbb{E}[epsilon_i mid D_i]\$ should be a nonlinear function of \$D_i\$. But with only two possible values for \$D_i\$, there is no room for nonlinearity.

### Proof

Let us assume \$mathbb{E}[epsilon_i]=0,~mathbb{E}[epsilon_i,D_i]=0\$ and denote the two possible values of \$D_i\$ by \$d_1\$ and \$d_2\$. Using the two assumptions and decomposing over \$D_i=d_1,D_i=d_2\$, we get begin{equation} begin{cases} mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1) + mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2) = 0 \ mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1),d_1 + mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2),d_2 = 0 end{cases} end{equation}

By solving this system of equations for \$mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1)\$ and \$mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2)\$, we see that either

1. \$d_1=d_2\$ or
2. \$mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1) = mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2)=0\$

The first case would mean \$D_i\$ has only one possible value (and mean independence would trivially hold). Assuming both probabilities \$mathbb{P}(D_i=d_k)>0\$*, the second case then implies \$mathbb{E}(epsilon_i mid D_i = d_{k} )=0\$, that is, mean independence. Thus, mean independence follows from the assumptions.

*If one of the probabilities is \$0\$, the corresponding \$mathbb{E}(epsilon_i mid D_i = d_k)\$ can technically obtain any value, but then the model would correspond to \$D_i\$ having only one possible values.

Rate this post