I have a panel data in the following form:
<id> <city> <treated> <time> <after>
where id identifies the individuals in my panel, city is the location where the individual live (non-time varying), treated is a dummy indicating those individual that are eventually treated (0: non-treated, 1: treated), time is a year-month variable, and after is a dummy (0: before, 1: after) indicating the period in which the treated unit are under treatment.
With this data, I am running a simple difference-in-differences model. I am including individual fixed effects and year-month fixed effects. To relax the parallel assumption, I include a treatment-city specific time trend — people often include treatment specific trends, but in my settings the outcome can vary a lot across cities for treatment and control units — so my specification should be:
xtset id time xtreg depvar i.treated i.after c.treated#c.after i.time i.city#c.treated#c.time, fe cluster(id)
where
i.city#c.treated#c.time
is the treatment-city specific linear trend.
Generally, in this kind of models I use to include only a treatment-specific trend, and not a 3-way interaction. The first question is whether this approach makes sense.
Second, I thought that adding:
i.city#c.treated#c.time
or
i.city#i.treated#c.time
was exactly the same (note that the variable treated in coded as 0/1), but apparently is not. Can someone explain me the statistical difference between including one term or the other in my regression?
Morever, some tests I did suggest that adding
i.treated#c.time
or
c.treated#c.time
is the same. Why with a 2-way interaction, treating the variable treated as continuous or as a factor does not matter?
P.S. Thanks to Dimitriy for the current answer!
Best Answer
This is edited a bit in response to revisions.
It is strange to me that treated and after are dummies, yet you are treating them as continuous variables by using the c.
prefix. I would have used the i.
prefix. I will assume that is what you had intended below.
This will not matter in simple models, but once you add interactions, Stata might choose a different city as the base in your specification (for reasons which elude me). This means the parameters will be different for that reason alone. Fixing the base with something like ib4.city
and ib0.time
will remedy this. I will add an example below.$^*$
Also, I might be inclined to cluster at the city level with this setup.
Also, note that i.treated#c.time and c.treated#c.time are not equivalent. Here's example with the cars data:
sysuse auto, clear reg price i.foreign#c.mpg reg price c.foreign#c.mpg
The first spec allows for two separate effects for mpg on price, one for domestic cars and one for foreign. The second allows for mpg to alter price for foreign cars only.
Now to answer your main question. As a way to relax the parallel trends assumption, people will often include time dummies and a parametric time trend (linear or quadratic) for the treated only in the estimating specification. You are going further and making the time trend city-specific in your first command. There's a nice SJ paper by Mora and Reggio, where they discuss this approach (take a look at equations (3) and (4)). If I understand their notation, that corresponds to i.city#c.treated#c.t
or i.city#1.treated#c.t
with i.t
. They also have a WP with a section on common types of specifications from the literature. All the linear/polynomial trends are for the treatment group only, not both.
Your goal can be accomplished by i.city#c.treated#c.t
since treating a binary variable as continous is equivalent to putting in a dummy. When you use i.city#i.treated#c.t
, that is equivalent to putting in two trends for each city: one for treated and one for control observations. I don't think this is something you want.
Personally, I think the clearest way to achieve the former goal is with i.city#1.treated#c.t
.
Here's an example with the Card & Krueger unemployment data, where I am using the fast food chain instead of city (Wendy's is the base category). This data only has two periods, so there's no separate time dummy. It's a bit confusing since X.chain#c.treated#c.t
coefficient is the same parameter as X.chain#1.treated#c.t
in the table, but it has different labels:
. set more off . estimates clear . use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear (Dataset from Card&Krueger (1994)) . drop if id==407 (4 observations deleted) . xtset id t panel variable: id (strongly balanced) time variable: t, 0 to 1 delta: 1 unit . gen chain = "" (816 missing values generated) . foreach var of varlist bk kfc roys wendys { 2. replace chain="`var'" if `var'==1 3. } variable chain was str1 now str2 (342 real changes made) variable chain was str2 now str3 (158 real changes made) variable chain was str3 now str4 (198 real changes made) variable chain was str4 now str6 (118 real changes made) . sencode chain, replace . eststo: qui xtreg fte i.treated##i.t i.chain#c.treated#c.t, fe cluster(id) (est1 stored) . eststo: qui xtreg fte i.treated##i.t i.chain#1.treated#c.t, fe cluster(id) (est2 stored) . eststo: qui xtreg fte i.treated##i.t i.chain#i.treated#c.t, fe cluster(id) (est3 stored) . esttab *, noomitted drop(0.treated 0.t 0.treated#0.t) varwidth(25) ------------------------------------------------------------------------- (1) (2) (3) fte fte fte ------------------------------------------------------------------------- 1.t -2.523* -2.523* -2.577 (-2.02) (-2.02) (-1.04) 1.treated#1.t 3.517 3.517 3.571 (1.63) (1.63) (1.17) 1.chain#c.treated#c.t 0.268 (0.14) 2.chain#c.treated#c.t -0.225 (-0.12) 3.chain#c.treated#c.t -2.439 (-1.28) 1.chain#1.treated#c.t 0.268 0.268 (0.14) (0.14) 2.chain#1.treated#c.t -0.225 -0.225 (-0.12) (-0.12) 3.chain#1.treated#c.t -2.439 -2.439 (-1.28) (-1.28) 1.chain#0.treated#c.t -0.791 (-0.23) 2.chain#0.treated#c.t 4.804 (1.87) 3.chain#0.treated#c.t -1.291 (-0.41) _cons 17.69*** 17.69*** 17.69*** (79.93) (79.93) (79.99) ------------------------------------------------------------------------- N 797 797 797 ------------------------------------------------------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001
$^*$Fixing The Base:
estimates clear eststo: xtreg fte i.treated##i.t ib4.chain#c.treated#c.t, fe cluster(id) eststo: xtreg fte c.treated##c.t ib4.chain#c.treated#c.t, fe cluster(id) eststo: xtreg fte c.treated##c.t i.chain#c.treated#c.t, fe cluster(id) esttab *, noomitted drop(0.treated 0.t 0.treated#0.t) varwidth(30)
Similar Posts:
- Solved – Differences in differences, fixed effects and standard errors
- Solved – Interpreting Random v. Fixed-effect Difference-in-Difference equation (+Stata version)
- Solved – Running multiple FE in Stata
- Solved – How to interpret logistic regression coefficients with interactions between binary and continuous variables
- Solved – Difference-in-differences with no pre-treatment