There are some good answers discussing convergence issues of logistic regression

when the data are well separated here and here. I am wondering what

can cause **convergence issues when the data are not well separated**.

As an example, I have the following data, `df`

` y x1 x2 1 0 66.06402 -1.0264739 2 1 58.40813 0.2887934 3 1 58.58011 0.2626232 4 0 59.05929 -0.5286438 5 0 55.81817 -1.3184894 6 0 58.00018 -0.8445602 7 1 69.53926 -1.1018149 8 0 55.73621 -0.9000901 9 1 79.80170 0.6690657 10 0 55.40042 0.6600415 11 0 57.42124 -0.7237973 12 1 78.22012 -0.8121816 13 0 53.54296 0.2265636 14 1 56.14096 0.4216436 15 1 66.90146 0.6189839 16 0 50.40008 0.4311339 `

Fitting a logistic regression in `R`

, I am getting a

`glm.fit: fitted probabilities numerically 0 or 1 occurred`

warning message even

though the data are non-separable

`> attach(df) > safeBinaryRegression::glm(y ~ x1 + x2, family=binomial) Call: safeBinaryRegression::glm(formula = y ~ x1 + x2, family = binomial) Coefficients: (Intercept) x1 x2 -82.930 1.395 10.255 Degrees of Freedom: 15 Total (i.e. Null); 13 Residual NullDeviance: 21.93 Residual Deviance: 5.927 AIC: 11.93 Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred `

A visual confirmation that the data are in fact non-separable is also included

Removing the red point seems resolve the convergence issues, however I am

at a bit of a loss for why this is.

`> df2 <- df[-c(9),] > detach(df) > attach(df2) > safeBinaryRegression::glm(y ~ x1 + x2, family=binomial) Call: safeBinaryRegression::glm(formula = y ~ x1 + x2, family = binomial) Coefficients: (Intercept) x1 x2 -82.930 1.395 10.255 Degrees of Freedom: 14 Total (i.e. Null); 12 Residual Null Deviance: 20.19 Residual Deviance: 5.927 AIC: 11.93 `

**Contents**hide

#### Best Answer

**The warning about "fitted probabilities numerically 0 or 1" might be useful for diagnosing separability, but these issues are only indirectly related.**

Here is a dataset and a binomial GLM fit (in gray) where there is enough overlap among the $x$ values for the two response classes that there is little concern about separability. In particular, the estimate of the $x$ coefficient of $2.35$ is modest and significant: its standard error is only $1.1$ $(p=0.03)$. The gray curve shows the fit. Corresponding to values on this curve are their log odds, or "link" function. Those I have indicated with colors; the legend gives the common (base-10) logs. The software flags fitted values that are within $2.22times 10^{-15}$ of either $0$ or $1$. Such points have white halos around them.

All that's going on here is there's such a wide range of $x$ values that for some points, the fit is very, very close to $0$ (for very negative $x$) or very, very close to $1$ (for the most positive $x$). This isn't a problem in this case.

It might be a problem in the next example. Now a single outlying value of $x$ triggers the warning message.

How can we assess this? *Simply delete the datum and re-fit the model.* In this example, it makes almost no difference: the coefficient estimate does not change, nor does the p-value.

Finally, to check a multiple regression, first form the linear combinations of the coefficient estimates and the variables, $x_ihatbeta$: this is the link function. Plot the responses against these values exactly as above and study the patterns, looking at (a) the degree to which the 1's overlap the 0's (which assesses separability) and (b) the points with extreme values of the link.

Here is the plot for your data:

The point at the far right corresponds to the red dot in your figure: the fitted value is $1$ because that dot is far from the area where 0's transition to 1's. If you remove it from the data, nothing changes. Thus, it's not influencing the results. This graph indicates you have obtained a reasonable fit.

You can also see that slight changes in the values of $x_1$ or $x_2$ at a couple of critical points (those near $0$) could create perfect separation. But is this really a problem? It would only mean that the software could no longer distinguish between this fit and other fits with arbitrarily sharp transitions near $xbeta=0$. However, all would produce similar predictions at all points sufficiently far from the transition line and the location of that line would still be fairly well estimated.