I am doing some count data analysis. The data is in this link.

Column A is the count data, and other columns are the independent variables. At first I used Poisson regression to analyze it:

`m0<-glm(A~.,data=d,family="poisson") summary(m0) `

We see that the residual deviance is greater than the degrees of freedom so that we have over-dispersion:

`Call: glm(formula = A ~ ., family = "poisson", data = d) Deviance Residuals: Min 1Q Median 3Q Max -28.8979 -4.5110 0.0384 5.4327 20.3809 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 8.7054842 0.9100882 9.566 < 2e-16 *** B -0.1173783 0.0172330 -6.811 9.68e-12 *** C 0.0864118 0.0182549 4.734 2.21e-06 *** D 0.1169891 0.0301960 3.874 0.000107 *** E 0.0738377 0.0098131 7.524 5.30e-14 *** F 0.3814588 0.0093793 40.670 < 2e-16 *** G -0.3712263 0.0274347 -13.531 < 2e-16 *** H -0.0694672 0.0022137 -31.380 < 2e-16 *** I -0.0634488 0.0034316 -18.490 < 2e-16 *** J -0.0098852 0.0064538 -1.532 0.125602 K -0.1105270 0.0128016 -8.634 < 2e-16 *** L -0.3304606 0.0155454 -21.258 < 2e-16 *** M 0.2274175 0.0259872 8.751 < 2e-16 *** N 0.2922063 0.0174406 16.754 < 2e-16 *** O 0.1179708 0.0119332 9.886 < 2e-16 *** P 0.0618776 0.0260646 2.374 0.017596 * Q -0.0303909 0.0060060 -5.060 4.19e-07 *** R -0.0018939 0.0037642 -0.503 0.614864 S 0.0383040 0.0065841 5.818 5.97e-09 *** T 0.0318111 0.0116611 2.728 0.006373 ** U 0.2421129 0.0145502 16.640 < 2e-16 *** V 0.1782144 0.0090858 19.615 < 2e-16 *** W -0.5105135 0.0258136 -19.777 < 2e-16 *** X -0.0583590 0.0043641 -13.373 < 2e-16 *** Y -0.1554609 0.0042604 -36.489 < 2e-16 *** Z 0.0064478 0.0001184 54.459 < 2e-16 *** AA 0.3880479 0.0164929 23.528 < 2e-16 *** AB 0.1511362 0.0050471 29.945 < 2e-16 *** AC 0.0557880 0.0181129 3.080 0.002070 ** AD -0.6569099 0.0368771 -17.813 < 2e-16 *** AE -0.0040679 0.0003960 -10.273 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 97109.0 on 56 degrees of freedom Residual deviance: 5649.7 on 26 degrees of freedom AIC: 6117.1 Number of Fisher Scoring iterations: 6 `

Then I think I should use negative binomial regression for the over-dispersion data. Since you can see I have many independent variables, and I wanted to select the important variables. And I decide to use stepwise regression to select the independent variable. At first, I create a full model:

`full.model <- glm.nb(A~., data=d,maxit=1000) # when not indicating maxit, or maxit=100, it shows Warning messages: 1: glm.fit: algorithm did not converge; 2: In glm.nb(A ~ ., data = d, maxit = 100) : alternation limit reached `

When indicating `maxit=1000`

, the warning message disappears.

`summary(full.model) Call: glm.nb(formula = A ~ ., data = d, maxit = 1000, init.theta = 2.730327193, link = log) Deviance Residuals: Min 1Q Median 3Q Max -2.5816 -0.8893 -0.3177 0.4882 1.9073 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 11.8228596 8.3004322 1.424 0.15434 B -0.2592324 0.1732782 -1.496 0.13464 C 0.2890696 0.1928685 1.499 0.13393 D 0.3136262 0.3331182 0.941 0.34646 E 0.3764257 0.1313142 2.867 0.00415 ** F 0.3257785 0.1448082 2.250 0.02447 * G -0.7585881 0.2343529 -3.237 0.00121 ** H -0.0714660 0.0343683 -2.079 0.03758 * I -0.1050681 0.0357237 -2.941 0.00327 ** J 0.0810292 0.0566905 1.429 0.15291 K 0.2582978 0.1574582 1.640 0.10092 L -0.2009784 0.1543773 -1.302 0.19296 M -0.2359658 0.3216941 -0.734 0.46325 N -0.0689036 0.1910518 -0.361 0.71836 O 0.0514983 0.1383610 0.372 0.70974 P 0.1843138 0.3253483 0.567 0.57105 Q 0.0198326 0.0509651 0.389 0.69717 R 0.0892239 0.0459729 1.941 0.05228 . S -0.0430981 0.0856391 -0.503 0.61479 T 0.2205653 0.1408009 1.567 0.11723 U 0.2450243 0.1838056 1.333 0.18251 V 0.1253683 0.0888411 1.411 0.15820 W -0.4636739 0.2348172 -1.975 0.04831 * X -0.0623290 0.0508299 -1.226 0.22011 Y -0.0939878 0.0606831 -1.549 0.12142 Z 0.0019530 0.0015143 1.290 0.19716 AA -0.2888123 0.2449085 -1.179 0.23829 AB 0.1185890 0.0696343 1.703 0.08856 . AC -0.3401963 0.2047698 -1.661 0.09664 . AD -1.3409002 0.4858741 -2.760 0.00578 ** AE -0.0006299 0.0051338 -0.123 0.90234 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(2.7303) family taken to be 1) Null deviance: 516.494 on 56 degrees of freedom Residual deviance: 61.426 on 26 degrees of freedom AIC: 790.8 Number of Fisher Scoring iterations: 1 Theta: 2.730 Std. Err.: 0.537 2 x log-likelihood: -726.803 `

When not indicating `maxit`

, or `maxit=100`

, it shows

Warning messages: 1: glm.fit: algorithm did not converge; 2: In glm.nb(A ~ ., data = d, maxit = 100) : alternation limit reached.

When indicating `maxit=1000`

, the warning message disappears.

Then I create a first model:

`first.model <- glm.nb(A ~ 1, data = d) `

Then I tried the forward stepwise regression:

`step.model <- step(first.model, direction="forward", scope=formula(full.model)) `

Error in glm.fit(X, y, wt, offset = offset, family = object$family, control = object$control) :

NA/NaN/Inf in 'x'

In addition: Warning message:

step size truncated due to divergence

What is the problem?

I also tried the backward regression:

`step.model2 <- step(full.model,direction="backward") #the final step Step: AIC=770.45 A ~ B + C + E + F + G + H + I + K + L + R + T + V + W + Y + AA + AB + AD Df Deviance AIC <none> 62.375 770.45 - AB 1 64.859 770.93 - H 1 65.227 771.30 - V 1 65.240 771.31 - L 1 65.291 771.36 - Y 1 65.831 771.90 - B 1 66.051 772.12 - C 1 67.941 774.01 - AA 1 69.877 775.95 - K 1 70.411 776.48 - W 1 71.526 777.60 - I 1 71.863 777.94 - E 1 72.338 778.41 - G 1 73.344 779.42 - F 1 73.510 779.58 - AD 1 79.620 785.69 - R 1 80.358 786.43 - T 1 95.725 801.80 Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: algorithm did not converge 3: glm.fit: algorithm did not converge 4: glm.fit: algorithm did not converge `

My question is: Why it is different in using forward and backward stepwise regression? And why do I get the error message when performing forward selection? Also, what exactly do these warning messages mean? And how should I deal with it?

I am not a stats person but need to conduct statical analysis for my research data. So I am struggling in learning how to do different regression analyses using real data. I searched online for similar questions but I still could understand … And please let me know if I did anything wrong in my regression analysis. I would really appreciate it if you could help me with these questions!

#### Best Answer

I have good news and bad news.

## good news

- you can probably more or less disregard the warnings. Where stepwise regression is recommended at all (see below …), backward regression is probably better than forward regression anyway.

- you can do forward and backward stepwise regression with
`MASS::stepAIC()`

(instead of`step`

).

## bad news

`step`

probably isn't doing what you think it's doing anyway. Rather than refitting the negative binomial dispersion parameter, it's re-fitting with a*fixed*overdispersion parameter, which is probably not what you want (there's a classically snarky e-mail from Prof. Brian Ripley from 2006 here that discusses this issue in passing). As mentioned above,`stepAIC()`

works better.- if you are
**only interested in predictive accuracy**, and not in anything about confidence intervals or hypothesis tests or measuring variable importance … then stepwise regression might be OK (Murtaugh 2009) … - but if you care at all about being able to make any
*inferences*about the effects of the parameters,**you have too many variables and not enough data**. A rule of thumb is that (1) you need at*least*10 times as many data points as predictor variables to do reliable inference and (2) doing any inference after selecting variables (via stepwise selection or otherwise) is*very wrong*[unless you do super-cutting-edge stuff that only works with huge data sets and very strong assumptions].

The big question here is: **why do you want to do variable selection in the first place?**

*you're only interested in prediction*: OK, but something like penalized regression (Dahlgren 2010) will probably work better*you're interested in inference*: this is going to be tough; you almost certainly*don't*have enough data to tell the effects of correlated variables apart. In your situation I would probably compute the principal components (PCA) of the predictor variables and use only the first 5 (which fall within the $n/10$ rule, and explain 99.5% of the variance in the predictors …)

Murtaugh, Paul A. “Performance of Several Variable-Selection Methods Applied to Real Ecological Data.” Ecology Letters 12, no. 10 (October 2009): 1061–68. https://doi.org/10.1111/j.1461-0248.2009.01361.x.

Dahlgren, Johan P. “Alternative Regression Methods Are Not Considered in Murtaugh (2009) or by Ecologists in General.” Ecology Letters 13, no. 5 (May 1, 2010): E7–9. https://doi.org/10.1111/j.1461-0248.2010.01460.x.

### Similar Posts:

- Solved – perfect separation logistic regression
- Solved – Validity of negative binomial model: algorithm did not converge and alternation limit reached
- Solved – Logistic regression model does not converge
- Solved – Logistic regression model does not converge
- Solved – How to generate data for logistic regression with an independent variable that is not centred