The notion of uncorrelated ($mathbb{E}[XY]=0$) and mean independence ($mathbb{E}[X|Y]=0$) are mentioned in different setting of regression assumptions. We know that $mathbb{E}[X|Y]=0$ implies $mathbb{E}[XY]=0$ (but not the other way round). Here is a specific question about the relationship between these two notions in the regression setting.

We are looking at the effect of whether go to school or not on the wage of a population. Let $D_iin{1,0}$ be the random variable denote whether individual $i$ went to school ($D_i=1$) or not ($D_i=0$). Let $Y_i$ be the wage of people $i$. Note that if we can FORCE everyone in the population go to school, then we will have a wage distribution denoted by $Y_{1i}$ and similarly, if we FORCE all people not go to school, we have a wage distribution denoted by $Y_{0i}$.

So we have $Y_i = D_iY_{1i} + (1-D_i)Y_{0i}~~~~~~~~~~~~~~~(1)$.

Note that we can always write $Y_{1i} =mu_1+epsilon_{1i}$ and $Y_{0i} =mu_0+epsilon_{0i}$, i.e., mean plus a noise with mean 0. Then, we substitute these 2 equations into equation (1), we have

$Y_i=mu_0+(mu_1-mu_0)D_i+epsilon_i~~~~~~~(2)$

where $epsilon_i=epsilon_{0i}+D_i(epsilon_{1i}-epsilon_{0i})$

Note that $epsilon_i$ has 0 mean clearly.

So equation (2) describes the real world about wage and school without making any assumptions other than the mean of $Y_{1i}$ and $Y_{0i}$ is finite.

Note that $epsilon_i$ will always dependent with $D_i$ (but they are not necessarily correlated). Now suppose $epsilon_i$ and $D_i$ are uncorrelated (first, I don't know what does this mean in practice), then we know that OLS estimator is consistent (for unbiasedness of OLS, it would require mean independence, i.e., $mathbb{E}[epsilon_i|D_i]=0$). So $mu_0$ and $mu_1$ is identifiable. In this case, $epsilon_i$ and $D_i$ being uncorrelated is equivalent to $mathbb{E}[epsilon_i D_i]=0$. I wonder if someone could explain the underlying meaning of this expression in this setting.

Note that a sufficient condition for $mathbb{E}[epsilon_i D_i]=0$ is that $mathbb{E}[epsilon_i|D_i]=0$. I can understand this expression very well, which is "given the information of $D_i$ is not going to change the mean of the random variable $epsilon_i$". Note that this is weaker than the notion of independence, since $epsilon_i$ independent of $D_i$ means that given the information of $D_i$, the distribution of $epsilon_i$ remains the same, which is much stronger than the first moment remains the same (i.e., $mathbb{E}[epsilon_i|D_i]=0$).

The expression $mathbb{E}[epsilon_i|D_i]=0$ can be explained intuitively if we look at this identification problem from a different angle, we have:

$E[Y_i|D_i=1]-E[Y_i|D_i=0]=(mu_1-mu_0)+E[epsilon_i|D_i=1]-E[epsilon_i|D_i=0]=(mu_1-mu_0)+E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]$.

Note that we observe $E[Y_i|D_i=1]$ and $E[Y_i|D_i=0]$ and we want to identify $mu_1-mu_0$, which requires $E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]=0$. Note that if randomly assign school or not school to people in the population, this will guarantee $E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]=0$ (or even if we don't have randomized assignment, but somehow, we know that $mathbb{E}[epsilon_i|D_i]=0$, then we are still able to make this claim).

However, if we only have $epsilon_i$ and $D_i$ are uncorrelated, i.e., $E[epsilon_i D_i]=0$, this will not imply $E[epsilon_{1i}|D_i=1]-E[epsilon_{0i}|D_i=0]=0$. But then it implies that by purely look at the group mean (i.e., $E[Y_i|D_i=1]$ and $E[Y_i|D_i=0]$) will not help us identify $mu_1-mu_0$, but run OLS will achieve this goal. Where is my logic going wrong?

**Contents**hide

#### Best Answer

The assumption here that $epsilon_i$ and $D_i$ are uncorrelated without mean independence holding is impossible when $D_i$ takes only two values. Intuitively, correlation measures the linear relationship between the values, so for mean independence to not hold in the presence of zero correlation, the mean $mathbb{E}[epsilon_i mid D_i]$ should be a nonlinear function of $D_i$. But with only two possible values for $D_i$, there is no room for nonlinearity.

### Proof

Let us assume $mathbb{E}[epsilon_i]=0,~mathbb{E}[epsilon_i,D_i]=0$ and denote the two possible values of $D_i$ by $d_1$ and $d_2$. Using the two assumptions and decomposing over $D_i=d_1,D_i=d_2$, we get begin{equation} begin{cases} mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1) + mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2) = 0 \ mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1),d_1 + mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2),d_2 = 0 end{cases} end{equation}

By solving this system of equations for $mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1)$ and $mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2)$, we see that either

- $d_1=d_2$ or
- $mathbb{P}(D_i=d_1),mathbb{E}(epsilon_i mid D_i = d_1) = mathbb{P}(D_i=d_2),mathbb{E}(epsilon_i mid D_i = d_2)=0$

The first case would mean $D_i$ has only one possible value (and mean independence would trivially hold). Assuming both probabilities $mathbb{P}(D_i=d_k)>0$*, the second case then implies $mathbb{E}(epsilon_i mid D_i = d_{k} )=0$, that is, mean independence. Thus, mean independence follows from the assumptions.

*If one of the probabilities is $0$, the corresponding $mathbb{E}(epsilon_i mid D_i = d_k)$ can technically obtain any value, but then the model would correspond to $D_i$ having only one possible values.

### Similar Posts:

- Solved – Understanding the meaning of the parameters in the linear regression model
- Solved – Heckman regression (Inverse mills ratio) significant or not
- Solved – Explain intuition regarding probability of “next flip” when tossing a fair coin
- Solved – Explain intuition regarding probability of “next flip” when tossing a fair coin
- Solved – Explain intuition regarding probability of “next flip” when tossing a fair coin