I am trying to study predictors of companies' pollution output of some specific chemicals. The data I am using have many 0's (i.e., the company did not pollute at all with those chemicals) and then are continuous with a long right tail. I have seen others model this data by logging the dependent variable after adding 1. My sense is that this is wrong, but I don't understand why. Could someone explain? This approach is much simpler than what I think I should be doing – using zero-inflated two-part models for semi-continuous data – so I'd be thrilled if it turned out simply adding 1 and logging is right.
Second, I have found a Stata ado file to run zero-inflated two-part models for semi-continuous data. Is there a way to incorporate fixed effects into this type of model?
Best Answer
Not sure about Stata, but R can run zero-inflated models with fixed effects. Check out, for example, the gamlss
package and zeroinfl()
from the pscl
package.
Similar Posts:
- Solved – Dealing with zero-inflation if the data are not count data type
- Solved – Imputation for a zero-inflated negative binomial mixed effects model
- Solved – Model selection: Two-Part Mixed Effects Model for Semi-Continuous Data
- Solved – Multi-level model with varying intercept vs. fixed effect regression
- Solved – How to model non-negative zero-inflated continuous data