# Solved – the best method to analyze these extremely skewed data with many zeros

I'm working on my bachelor's thesis and have an analysis where the dependent variable (number of months of parental leave of fathers) has a very skewed distribution, as follows: 1089 times the value 0, 18 times the value 1, 89 times the value 2, 29 times the value 3, 11 times the value 4, and so on, with all further values occurring less than 10 times.

Now, the same variable from the same data set has already been analyzed in several papers that got published in scientific journals, and they all used several variants of linear regression on the untransformed data.

My question: Is this approach really valid? From all I have learned in my introductory statistics classes, you need a normally distributed dependent variable for linear regression. And these data are clearly non-normal and cannot be transformed to be normal either. What other methods could be used instead? Might negative binomial regression be an option? Or is linear regression OK to use after all?

Thanks,
Stefanie

Contents