If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression? Maybe I'm getting tripped up with the language and whether I should be using the lm
or plm
R packages.
My goal is to estimate the effect of a baby bonus. My dependent variable is a binary indicator for NEWBORN and my main independent variable of interest is an indicator for receiving the baby bonus. I control for age, age squared, education, marital status, and household income.
1) ## Linear Probability LPM <- lm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45) # how do I add FE to a lm model in R? 2) ## FE Model FE_model <- plm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45, index = "region", model = "within")
Best Answer
As indicated in the comments, the answer on Stack Overflow demonstrates, explicitly, that your coefficients are identical. I will offer some further intuition.
If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression?
Yes. The plm()
function is a panel data estimator. Technically, it runs lm()
on your transformed data. Typically, when students learn about "fixed effects" for the first time, they learn that it is a deviation from a "within-group" time mean. Later, they come across some empirical specification in a paper and observe a parameter in a model—estimated via least squares—that is unit-subscripted, such as $gamma_s$ (i.e., state effect) or $gamma_r$ (i.e., region effect), and they ask if this is equivalent to performing a fixed effects regression. It is.
The plm()
function with index = "region"
and model = "within"
will return the same coefficients as your lm()
function with as.factor(region)
included as a covariate. In R, as.factor()
creates a series of dummy variables for your regions. You can think of this as each region getting its own unique intercept.
In sum, treating your "region effects" as parameters to be estimated is algebraically equivalent to estimation in deviations from means. The boilerplate code below will result in identical coefficients on your treatment dummy (i.e., baby bonus).
# --- The Least Squares Dummy Variable Estimator lm(outcome ~ treatment + ... + as.factor(region), data = ...) # --- The Fixed Effects (Within-Group) Estimator plm(outcome ~ treatment + ... , index = "region", model = "within", data = ...)
I hope this helps your intuition.
Similar Posts:
- Solved – the difference between region, year and region-year fixed effects
- Solved – Linear mixed effect model interpretation with log transformed dependent variable
- Solved – Including time-varying regional fixed effects in Arellano-Bond estimation (R plm package)
- Solved – Diff-in-Diff vs Fixed Effects with dumthe that is not time-varying
- Solved – PLM: Keep dumthe variable in Fixed Effects / Random Effects analysis