I'm wondering how an instrumental variable addresses selection bias in regression.
Here's the example I'm chewing on: In Mostly Harmless Econometrics, the authors discuss an IV regression relating to military service and earnings later in life. The question is, "Does serving in the military increase or decrease future earnings?" They investigate this question in the context of the Vietnam war. I understand that military service cannot be randomly assigned, and that this is a problem for causal inference.
To address this issue, the researcher uses draft eligibility (as in "your draft number is called") as an instrument for actual military service. That makes sense: the Vietnam draft randomly assigned young American men to the military (in theory–whether the draftees actually served touches on my question). Our other IV condition seems solid: draft eligibility and actual military service are strongly, positively correlated.
Here's my question. It seems like you'd get self-selection bias: maybe richer kids can get out of serving in Vietnam, even if their draft numbers are called. (If that wasn't actually the case, let's pretend for the sake of my question). If this self-selection creates systemic bias within our sample, how does our instrumental variable address this bias? Must we narrow our scope of inference to "the types of people who couldn't escape the draft?" Or does the IV somehow salvage that part of our inference? If anybody could explain how this works, I'd be very grateful.
Best Answer
Actually the issue of selection bias is the initial motivation for using instruments. The question here is whether the randomized draft lottery gets around this issue. You are perfectly right in asking: what are the limitations of this instrument? If indeed rich kids had better chances to avoid the draft, then the negative effect of service on later earnings will be over-estimated in absolute terms.
There were other ways out of the draft, for instance due to poor health. Or, on the contrary, it was known among potential draftees that volunteering rather than being drafted by the lottery resulted in better placements and service conditions. Hence people with lottery numbers that were more likely to be drafted often chose to volunteer instead. If such avoidance behavior undermines the randomization process in a way that you describe, then our 2SLS estimates will still be biased. Narrowing the sample to those who did not escape the draft doesn't help you in this case because randomization of the treatment again is not actually random.
However, if non-compliance with the treatment is still random or not significant on average, the lottery numbers can still be used as an instrument. In this case your instrument for military service is the intention-to-treat (ITT, see the corresponding chapter in Angrist and Pischke's book). So the important point is that if there is non-compliance due to whatever reason, we must show that this does not invalidate the randomization. Then this instrument is ok, otherwise we cannot use it.
There are a couple of ways to test this. You could regress the instrument $Z_i$ on personal characteristics that are unaffected by the treatment $D_i$ like age, race, etc. which are determined before $D_i$ is determined. Another check is to test the effect of the instrument on the outcome in samples without relationship between $D_i$ and $Z_i$, like volunteers who volunteered before they received a draft lottery number. The idea is that if the only reason why your lottery number affects your later earnings is through service status, then the draft-eligibility should have no effect on earnings in samples where it is unrelated to service status.
Angrist (1990) performs some of these checks to address your concern. Despite the above raised concerns it turns out that the draft lottery appears to be a solid instrument. Berinsky (2010) provides lots of more randomization checks and gives further background information on the history of the lottery.
Similar Posts:
- Solved – Great examples of instrumental variable estimators
- Solved – How to find & remove duplicates in data frames?
- Solved – observational study vs natural experiment
- Solved – High Collinearity between Instrument and Endogenous Variable in IV Estimation: Weak Instrument Problem
- Solved – High Collinearity between Instrument and Endogenous Variable in IV Estimation: Weak Instrument Problem