If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression? Maybe I'm getting tripped up with the language and whether I should be using the
plm R packages.
My goal is to estimate the effect of a baby bonus. My dependent variable is a binary indicator for NEWBORN and my main independent variable of interest is an indicator for receiving the baby bonus. I control for age, age squared, education, marital status, and household income.
1) ## Linear Probability LPM <- lm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45) # how do I add FE to a lm model in R? 2) ## FE Model FE_model <- plm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45, index = "region", model = "within")
As indicated in the comments, the answer on Stack Overflow demonstrates, explicitly, that your coefficients are identical. I will offer some further intuition.
If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression?
plm() function is a panel data estimator. Technically, it runs
lm() on your transformed data. Typically, when students learn about "fixed effects" for the first time, they learn that it is a deviation from a "within-group" time mean. Later, they come across some empirical specification in a paper and observe a parameter in a model—estimated via least squares—that is unit-subscripted, such as $gamma_s$ (i.e., state effect) or $gamma_r$ (i.e., region effect), and they ask if this is equivalent to performing a fixed effects regression. It is.
plm() function with
index = "region" and
model = "within" will return the same coefficients as your
lm() function with
as.factor(region) included as a covariate. In R,
as.factor() creates a series of dummy variables for your regions. You can think of this as each region getting its own unique intercept.
In sum, treating your "region effects" as parameters to be estimated is algebraically equivalent to estimation in deviations from means. The boilerplate code below will result in identical coefficients on your treatment dummy (i.e., baby bonus).
# --- The Least Squares Dummy Variable Estimator lm(outcome ~ treatment + ... + as.factor(region), data = ...) # --- The Fixed Effects (Within-Group) Estimator plm(outcome ~ treatment + ... , index = "region", model = "within", data = ...)
I hope this helps your intuition.
- Solved – the difference between region, year and region-year fixed effects
- Solved – Linear mixed effect model interpretation with log transformed dependent variable
- Solved – Including time-varying regional fixed effects in Arellano-Bond estimation (R plm package)
- Solved – Diff-in-Diff vs Fixed Effects with dumthe that is not time-varying
- Solved – PLM: Keep dumthe variable in Fixed Effects / Random Effects analysis