Solved – Enormous coefficients in logistic regression – what does it mean and what to do

I get enormous coefficients during logistic regression, see coefficients with krajULKV:

> summary(m5)  Call: glm(formula = cbind(ml, ad) ~ rok + obdobi + kraj + resid_usili2 +      rok:obdobi + rok:kraj + obdobi:kraj + kraj:resid_usili2 +      rok:obdobi:kraj, family = "quasibinomial")  Deviance Residuals:      Min       1Q   Median       3Q      Max   -2.7796  -1.0958  -0.3101   1.0034   2.8370    Coefficients:                               Estimate     Std. Error t value Pr(>|t|)    (Intercept)                 -486.72087      664.71911  -0.732  0.46424    rok                            0.24232        0.33114   0.732  0.46452    obdobinehn                  3400.43703     1354.14874   2.511  0.01223 *  krajJHC                      786.22409      708.50291   1.110  0.26746    krajJHM                      511.85538      823.03038   0.622  0.53417    krajLBK                      -23.94180     2388.86316  -0.010  0.99201    krajMSK                     1281.88767      955.09736   1.342  0.17992    krajOLK                     -175.19425     1255.82946  -0.140  0.88909    krajPAK                      349.76438     1071.03364   0.327  0.74408    krajPLK                    -1335.73206     1534.09899  -0.871  0.38418    krajSTC                      868.99157      692.30426   1.255  0.20976    krajULKV                  245661.86828 17496742.31677   0.014  0.98880    krajVYS                     3341.76686     1314.77140   2.542  0.01121 *  krajZLK                     3950.75617     2922.25220   1.352  0.17676    resid_usili2                  -1.44719        0.89315  -1.620  0.10555    rok:obdobinehn                -1.69479        0.67462  -2.512  0.01219 *  rok:krajJHC                   -0.39108        0.35295  -1.108  0.26817    rok:krajJHM                   -0.25481        0.40997  -0.622  0.53443    rok:krajLBK                    0.01621        1.19155   0.014  0.98915    rok:krajMSK                   -0.63985        0.47592  -1.344  0.17917    rok:krajOLK                    0.08714        0.62545   0.139  0.88923    rok:krajPAK                   -0.17419        0.53344  -0.327  0.74410    rok:krajPLK                    0.66539        0.76383   0.871  0.38394    rok:krajSTC                   -0.43292        0.34490  -1.255  0.20976    rok:krajULKV                -122.01076     8704.03367  -0.014  0.98882    rok:krajVYS                   -1.66391        0.65468  -2.542  0.01122 *  rok:krajZLK                   -1.96718        1.45474  -1.352  0.17667    obdobinehn:krajJHC         -3623.86807     1385.86009  -2.615  0.00909 ** obdobinehn:krajJHM         -3220.08906     1458.83842  -2.207  0.02757 *  obdobinehn:krajLBK         -1051.07131     3434.11845  -0.306  0.75963    obdobinehn:krajMSK         -6415.65781     1978.30260  -3.243  0.00123 ** obdobinehn:krajOLK         -2427.66591     1777.51914  -1.366  0.17239    obdobinehn:krajPAK         -3111.45312     1623.59145  -1.916  0.05566 .  obdobinehn:krajPLK         -1800.26258     2065.74461  -0.871  0.38375    obdobinehn:krajSTC         -4409.45624     1379.64196  -3.196  0.00145 ** obdobinehn:krajULKV      -187832.68360 16454272.74951  -0.011  0.99089    obdobinehn:krajVYS         -5445.51446     1791.38012  -3.040  0.00244 ** obdobinehn:krajZLK         -6216.43343     3167.49836  -1.963  0.05003 .  krajJHC:resid_usili2           1.60474        0.98554   1.628  0.10385    krajJHM:resid_usili2           1.57822        1.04518   1.510  0.13143    krajLBK:resid_usili2          11.53462       13.40012   0.861  0.38961    krajMSK:resid_usili2          -1.33600        1.55241  -0.861  0.38971    krajOLK:resid_usili2           0.07296        1.27034   0.057  0.95421    krajPAK:resid_usili2           1.35880        1.23033   1.104  0.26974    krajPLK:resid_usili2           1.90189        1.41163   1.347  0.17826    krajSTC:resid_usili2           2.05237        0.95972   2.139  0.03277 *  krajULKV:resid_usili2        599.79215    20568.86123   0.029  0.97674    krajVYS:resid_usili2           3.03834        1.16464   2.609  0.00925 ** krajZLK:resid_usili2           1.18574        1.11024   1.068  0.28583    rok:obdobinehn:krajJHC         1.80611        0.69042   2.616  0.00906 ** rok:obdobinehn:krajJHM         1.60475        0.72676   2.208  0.02751 *  rok:obdobinehn:krajLBK         0.52268        1.71244   0.305  0.76027    rok:obdobinehn:krajMSK         3.19712        0.98564   3.244  0.00123 ** rok:obdobinehn:krajOLK         1.21012        0.88541   1.367  0.17208    rok:obdobinehn:krajPAK         1.55034        0.80886   1.917  0.05563 .  rok:obdobinehn:krajPLK         0.89718        1.02893   0.872  0.38349    rok:obdobinehn:krajSTC         2.19742        0.68732   3.197  0.00144 ** rok:obdobinehn:krajULKV       93.43130     8189.24994   0.011  0.99090    rok:obdobinehn:krajVYS         2.71357        0.89236   3.041  0.00243 ** rok:obdobinehn:krajZLK         3.09624        1.57711   1.963  0.04996 *  --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1   (Dispersion parameter for quasibinomial family taken to be 1.258421)      Null deviance: 1518.0  on 878  degrees of freedom Residual deviance: 1228.6  on 819  degrees of freedom   (465 observations deleted due to missingness) AIC: NA  Number of Fisher Scoring iterations: 18 

What does this mean?? Does it mean some multicollinearity, like @Scortchi mentioned in this discussion? Or does this mean overfitting? How to detect the problem? What shall I do now?

I tried to remove some variables. This helps a bit but not so much:

> m6 <- update(m5, ~.- kraj:resid_usili2) > m7 <- update(m6, ~.- resid_usili2) > summary(m7)  Call: glm(formula = cbind(ml, ad) ~ rok + obdobi + kraj + rok:obdobi +      rok:kraj + obdobi:kraj + rok:obdobi:kraj, family = "quasibinomial")  Deviance Residuals:      Min       1Q   Median       3Q      Max   -2.9098  -1.1931  -0.2274   1.0529   3.1283    Coefficients:                            Estimate  Std. Error t value Pr(>|t|) (Intercept)              -118.95199   476.34698  -0.250    0.803 rok                         0.05971     0.23718   0.252    0.801 obdobinehn                412.69412   646.95083   0.638    0.524 krajJHC                   447.69791   498.45358   0.898    0.369 krajJHM                   -62.92516   525.85737  -0.120    0.905 krajLBK                   677.73239  1595.20024   0.425    0.671 krajMSK                   278.24639   621.32312   0.448    0.654 krajOLK                  -705.97832   782.53474  -0.902    0.367 krajPAK                   387.96543   608.98961   0.637    0.524 krajPLK                  -653.68419   782.20737  -0.836    0.403 krajSTC                  -114.34822   489.06318  -0.234    0.815 krajULKV                -2117.64674  1797.75836  -1.178    0.239 krajVYS                   884.74411   681.05324   1.299    0.194 krajZLK                  -997.77613   925.93280  -1.078    0.281 rok:obdobinehn             -0.20602     0.32211  -0.640    0.523 rok:krajJHC                -0.22303     0.24819  -0.899    0.369 rok:krajJHM                 0.03092     0.26180   0.118    0.906 rok:krajLBK                -0.33909     0.79438  -0.427    0.670 rok:krajMSK                -0.13889     0.30935  -0.449    0.654 rok:krajOLK                 0.35102     0.38943   0.901    0.368 rok:krajPAK                -0.19392     0.30323  -0.640    0.523 rok:krajPLK                 0.32463     0.38937   0.834    0.405 rok:krajSTC                 0.05677     0.24351   0.233    0.816 rok:krajULKV                1.05287     0.89453   1.177    0.239 rok:krajVYS                -0.44149     0.33911  -1.302    0.193 rok:krajZLK                 0.49612     0.46081   1.077    0.282 obdobinehn:krajJHC       -776.31258   672.68911  -1.154    0.249 obdobinehn:krajJHM       -267.78650   700.38741  -0.382    0.702 obdobinehn:krajLBK      -1246.67321  1760.37329  -0.708    0.479 obdobinehn:krajMSK       -383.77613   858.81391  -0.447    0.655 obdobinehn:krajOLK        -96.72334   947.75189  -0.102    0.919 obdobinehn:krajPAK       -540.25140   827.13134  -0.653    0.514 obdobinehn:krajPLK       -517.49161  1124.63474  -0.460    0.645 obdobinehn:krajSTC       -683.81160   672.66674  -1.017    0.310 obdobinehn:krajULKV      2344.32314  2073.98366   1.130    0.259 obdobinehn:krajVYS       -795.62043   917.80551  -0.867    0.386 obdobinehn:krajZLK        618.33075  1093.37768   0.566    0.572 rok:obdobinehn:krajJHC      0.38725     0.33493   1.156    0.248 rok:obdobinehn:krajJHM      0.13374     0.34870   0.384    0.701 rok:obdobinehn:krajLBK      0.62237     0.87662   0.710    0.478 rok:obdobinehn:krajMSK      0.19114     0.42758   0.447    0.655 rok:obdobinehn:krajOLK      0.04842     0.47171   0.103    0.918 rok:obdobinehn:krajPAK      0.26922     0.41184   0.654    0.513 rok:obdobinehn:krajPLK      0.25790     0.55986   0.461    0.645 rok:obdobinehn:krajSTC      0.34078     0.33492   1.017    0.309 rok:obdobinehn:krajULKV    -1.16571     1.03236  -1.129    0.259 rok:obdobinehn:krajVYS      0.39675     0.45704   0.868    0.386 rok:obdobinehn:krajZLK     -0.30732     0.54422  -0.565    0.572  (Dispersion parameter for quasibinomial family taken to be 1.313286)      Null deviance: 2396.8  on 1343  degrees of freedom Residual deviance: 2110.3  on 1296  degrees of freedom AIC: NA  Number of Fisher Scoring iterations: 5 

EDIT: As proposed by Scortchi, I tried to use VIF and I also get enormous values. What does this mean? See:

> require(HH) > vif(cbind(ml, ad) ~ rok + obdobi + kraj + resid_usili2 +  +         rok:obdobi + rok:kraj + obdobi:kraj + kraj:resid_usili2 +  +         rok:obdobi:kraj)                     rok              obdobinehn                 krajJHC                 krajJHM                50.281603         45075363.969712         15194580.406796         11362184.620230                  krajLBK                 krajMSK                 krajOLK                 krajPAK           7567915.376763          5228018.864051         17105623.986998         10944471.683601 [... cut out ...] 

I would suggest that the massive coefficients, and the correspondingly massive standard errors, would almost definitely be caused by quasi-complete or complete separation. That is, for some combination of parameters, either everyone had the outcome or nobody had the outcome, and so the coefficient heads towards infinity (or negative infinity.)

This tends to happen especially when one specifies a lot of interaction terms, as the chances of having a combination of factors which results in some "empty" (no outcomes in cell, or everyone has outcomes) cells will increase.

See the following page for some further details and suggested strategies (link updated March 2021): https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/

More generally, it means that you're probably trying to do "too much" with your model for the size of your dataset (particularly the number of outcomes observed).

EDIT: A couple of pragmatic suggestions

You might try (1) quick and simple: drop the interaction terms from your model, to see if that helps (whether this makes sense from a research question perspective is an entirely different issue); or (2) get R to make you a bi-i-i-i-g contingency table for (e.g. rows) the combinations described in the interactions by (e.g. columns) the outcome variable. You might be able to see some evidence of separation here.

Similar Posts:

Rate this post

Leave a Comment