# Solved – Linear probability model with fixed effects

If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression? Maybe I'm getting tripped up with the language and whether I should be using the `lm` or `plm` R packages.

My goal is to estimate the effect of a baby bonus. My dependent variable is a binary indicator for NEWBORN and my main independent variable of interest is an indicator for receiving the baby bonus. I control for age, age squared, education, marital status, and household income.

``1) ## Linear Probability  LPM <- lm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45) # how do I add FE to a lm model in R?   2) ## FE Model        FE_model <- plm(newborn ~ treatment + age + age_sq + highest_education + marital_stat + hh_income_log, data = fertility_15_45, index = "region", model = "within") ``
Contents

As indicated in the comments, the answer on Stack Overflow demonstrates, explicitly, that your coefficients are identical. I will offer some further intuition.

If I want to estimate a linear probability model with (region) fixed effects, is that the same as just running a fixed effects regression?

Yes. The `plm()` function is a panel data estimator. Technically, it runs `lm()` on your transformed data. Typically, when students learn about "fixed effects" for the first time, they learn that it is a deviation from a "within-group" time mean. Later, they come across some empirical specification in a paper and observe a parameter in a model—estimated via least squares—that is unit-subscripted, such as $$gamma_s$$ (i.e., state effect) or $$gamma_r$$ (i.e., region effect), and they ask if this is equivalent to performing a fixed effects regression. It is.

The `plm()` function with `index = "region"` and `model = "within"` will return the same coefficients as your `lm()` function with `as.factor(region)` included as a covariate. In R, `as.factor()` creates a series of dummy variables for your regions. You can think of this as each region getting its own unique intercept.

In sum, treating your "region effects" as parameters to be estimated is algebraically equivalent to estimation in deviations from means. The boilerplate code below will result in identical coefficients on your treatment dummy (i.e., baby bonus).

``# --- The Least Squares Dummy Variable Estimator  lm(outcome ~ treatment + ... + as.factor(region), data = ...)  # --- The Fixed Effects (Within-Group) Estimator   plm(outcome ~ treatment + ... , index = "region", model = "within", data = ...) ``

I hope this helps your intuition.

Rate this post