I have a problem with logistic regression.
I had found out (here) that one of the assumptions of logistic regression model should be min. of for example 50 observations per predictor. But if I had created dummy variables in Stata using "i." operator, does every dummy category count as new predictor?
example:
- i.maritalstatus
- 1=Ref.
- 2
- 3 …
Thank You very much.
Best Answer
It's not an assumption; it's a rule of thumb to avoid over-fitting. You'd have to ask the author of the document you link to in exactly what circumstances they envisaged it applying, but it clearly isn't going to work if you have a large number of categories per predictor, or if the number of observations in one response category is very small.
More common rules of thumb state that for each coefficient you're estimating (bar the intercept) you should have at least 10–20 observations in the least common response category. But you should check for over-fitting using bootstrap validation or cross-validation in any case when it's a concern.
Similar Posts:
- Solved – What are the consequences of rare events in logistic regression
- Solved – Dumthe Variables and Learning algorithms
- Solved – Poisson or binomial regression
- Solved – How to calculate the coefficient of a dumthe variable reference category
- Solved – Multiple Regression – Minimum Observations Per Dumthe Variable