I need to run a Diff in Diff analysis, but I'm not sure whether my code is ok and how to interpret the results. I want to assess the effect of a law implementation on domestic violence. Here are the code and logic I used:
# Create 2 dummy variables to indicate before/after the law's approval, # plus the control and treatment group # The MPL went into effect in the year 2006 mydata$law <- as.numeric(mydata$year >= 2006) # Create a dummy variable from variable ch sex, now variable women has value 1 and men 0 mydata$group <- ifelse(mydata$sex == "women", 1, 0) # The MPL only affects women, since it curbs domestic violence against them. # So the treatment group will be women (group = 1) and men will be control (group = 0) mydata$women <- as.numeric(mydata$group == 1) # Run Diff in Diff reg1 = lm(rate_domicile ~ women + law + law*women, data = mydata) summary(reg1)
this is what I get:
I appreciate any input since I'm new to R 🙂
This is the classical difference-in-differences (DiD) model. It is a standard interaction model.
You interpret the estimate of the intercept as the mean of your outcome for the control group (i.e., men) in the year(s) before the law was enacted (women = 0 and law = 0).
The coefficient on women is the expected mean change in $y$ between treatment and control groups in the pretreatment period (women = 1 and law = 0). This can viewed as the "baseline difference" in your outcome between the two groups.
The coefficient on law is the expected mean difference in $y$ before and after policy implementation among the control group (women = 0 and law = 1). The main effect for law (i.e., the post-treatment variable) is the effect of the simple passage of time in the absence of the new legislation.
The estimated coefficient associated with your interaction term should be your focus. This is your estimate of the treatment effect. This is testing whether the expected mean change in your outcome $y$ before and after the new domestic violence legislation was different for men and women.
It should be noted that your coefficients should be interpreted in the context of your research question. You did not specify how
rate_domicile is calculated. Is it a per capita estimate of women living in their primary residence? Be sure to interpret estimates in terms of the proper units.
As far as the R code is concerned, I don't have a problem with your specification. There are, however, more efficient ways to achieve the same result with less key strokes. Ben Bolker's suggestion in the comments works fine and allows software to do most of the heavy lifting for you. In other words, you will make less mistakes in the data preparation phase. See the R code below:
reg_version_1 <- lm(rate_domicile ~ sex*I(year >= 2006), data = mydata)
If you're partial to preparing the variables ahead of time, then you also save space by only including the interaction of your treatment variable with the post-treatment indicator. R automatically estimates the constituent terms of the interaction. See the R code below:
reg_version_2 <- lm(rate_domicile ~ law*women, data = mydata)
Another concern is how to deal with your standard errors. Your observations likely exhibit dependence within units. But this could serve as a whole new discussion.
I hope this helps!
- Solved – Difference-in-difference with multiple periods and one more interaction
- Solved – Confused about results from placebo diff-in-diff
- Solved – Difference-in-differences using two time series
- Solved – Two-Way ANOVA with non-sig interaction effect: is it reasonable to argue for an interaction from simple effects
- Solved – Causality studies on observational data: DID with 2SLS to compliment Rubin causal model