# Solved – Interpretation of coefficients in logistic regression output

I am doing logistic regression in R on a binary dependent variable with only one independent variable. I found the odd ratio as 0.99 for an outcomes. This can be shown in following. Odds ratio is defined as, \$ratio_{odds}(H) = frac{P(X=H)}{1-P(X=H)}\$. As given earlier \$ratio_{odds} (H) = 0.99\$ which implies that \$P(X=H) = 0.497\$ which is close to 50% probability. This implies that the probability for having a H cases or non H cases 50% under the given condition of independent variable. This does not seem realistic from the data as only ~20% are found as H cases. Please give clarifications and proper explanations of this kind of cases in logistic regression.

I am hereby adding the results of my model output:

``M1 <- glm(H~X, data=data, family=binomial()) summary(M1)  Call: glm(formula = H ~ X, family = binomial(), data = data)  Deviance Residuals:      Min       1Q   Median       3Q      Max   -1.8563   0.6310   0.6790   0.7039   0.7608    Coefficients:                 Estimate      Std. Error      z value     Pr(>|z|)     (Intercept)    1.6416666      0.2290133      7.168      7.59e-13 ***    X          -0.0014039      0.0009466     -1.483      0.138     --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  (Dispersion parameter for binomial family taken to be 1)      Null deviance: 1101.1  on 1070  degrees of freedom Residual deviance: 1098.9  on 1069  degrees of freedom   (667 observations deleted due to missingness) AIC: 1102.9  Number of Fisher Scoring iterations: 4   exp(cbind(OR=coef(M1), confint(M1))) Waiting for profiling to be done...                                       OR           2.5 %       97.5 % (Intercept)                    5.1637680       3.3204509     8.155564      X                         0.9985971       0.9967357     1.000445 ``

I have 1738 total dataset, of which H is a dependent binomial variable. There are 19.95% fall in (H=0) category and remaining are in (H=1) category. Further this binomial dependent variable compare with the covariate X whose minimum value is 82.23, mean value is 223.8 and maximum value is 391.6. The 667 missing values correspond to the covariate X i.e 667 data for X is missing in the dataset out of 1738 data.

Contents

### Summary

The question misinterprets the coefficients.

The software output shows that the log odds of the response don't depend appreciably on \$X\$, because its coefficient is small and not significant (\$p=0.138\$). Therefore the proportion of positive results in the data, equal to \$100 – 19.95% approx 80%\$, ought to have a log odds close to the intercept of \$1.64\$. Indeed,

\$\$logleft(frac{80%}{20%}right) = log(4) approx 1.4\$\$

is only about one standard error (\$0.22\$) away from the intercept. Everything looks consistent.

### Detailed analysis

This generalized linear model supposes that the log odds of the response \$H\$ being \$1\$ when the independent variable \$X\$ has a particular value \$x\$ is some linear function of \$x\$,

\$\$text{Log odds}(H=1,|,X=x) = beta_0 + beta_1 x.tag{1}\$\$

The `glm` command in `R` estimated these unknown coefficients with values \$\$hatbeta_0 = 1.641666pm 0.2290133\$\$ and \$\$hatbeta_1 = -0.0014039pm 0.0009466.\$\$

The dataset contains a large number \$n\$ of observations with various values of \$x\$, written \$x_i\$ for \$i=1, 2, ldots, n\$, which range from \$82.3\$ to \$391.6\$ and average \$bar x = 223.8\$. Formula \$(1)\$ enables us to compute the estimated probabilities of each outcome, \$Pr(H=1,|,X=x_i)\$. If the model is any good, the average of those probabilities ought to be close to the average of the outcomes.

Since the odds are, by definition, the ratio of a probability to its complement, we can use simple algebra to find the estimated probabilities in terms of the log odds

\$\$widehatPr(H=1,|,X=x) = 1 – frac{1}{1 + expleft(hatbeta_0 + hatbeta_1 xright)}.\$\$

As a nonlinear function of \$x\$, that's difficult to average. However, provided \$beta_1 x\$ is small (much less than \$1\$ in size) and \$1+exp(hatbeta_0)\$ is not small (it exceeds \$6\$ in this case), we can safely use a linear approximation

\$\$frac{1}{1 + expleft(hatbeta_0 + hatbeta_1 xright)} = frac{1}{1 + exp(hatbeta_0)}left(1 – hatbeta_1 x frac{exp{hatbeta_0}}{1 + exp(hatbeta_0)}right) + Oleft(hatbeta_1 xright)^2.\$\$

Since the \$x_i\$ never exceed \$391.6\$, \$|hatbeta_1 x_i|\$ never exceeds \$391.6times 0.0014039 approx 0.55\$, so we're ok. Consequently, the average of the outcomes may be approximated as

\$\$eqalign{ frac{1}{n}sum_{i=1}^n widehatPr(H=1,|,X=x) &approx frac{1}{n}sum_{i=1}^n left(1 – frac{1}{1 + exp(hatbeta_0)}left(1 – hatbeta_1 x_i frac{exp{hatbeta_0}}{1 + exp(hatbeta_0)}right)right)\ &= 0.162238 + 0.000190814 bar{x} \ &= 20.4943%. }\$\$

Although that's not exactly equal to the \$19.95%\$ observed in the data, it is more than close enough, because \$hatbeta_1\$ has a relatively large standard error. For example, if \$beta_1\$ were increased by only \$0.3\$ of its standard error to \$-0.0011271\$, then the previous calculation would produce \$19.95%\$ exactly.

Rate this post