Solved – Interpretation of coefficients in logistic regression output

I am doing logistic regression in R on a binary dependent variable with only one independent variable. I found the odd ratio as 0.99 for an outcomes. This can be shown in following. Odds ratio is defined as, $ratio_{odds}(H) = frac{P(X=H)}{1-P(X=H)}$. As given earlier $ratio_{odds} (H) = 0.99$ which implies that $P(X=H) = 0.497$ which is close to 50% probability. This implies that the probability for having a H cases or non H cases 50% under the given condition of independent variable. This does not seem realistic from the data as only ~20% are found as H cases. Please give clarifications and proper explanations of this kind of cases in logistic regression.

I am hereby adding the results of my model output:

M1 <- glm(H~X, data=data, family=binomial()) summary(M1)  Call: glm(formula = H ~ X, family = binomial(), data = data)  Deviance Residuals:      Min       1Q   Median       3Q      Max   -1.8563   0.6310   0.6790   0.7039   0.7608    Coefficients:                 Estimate      Std. Error      z value     Pr(>|z|)     (Intercept)    1.6416666      0.2290133      7.168      7.59e-13 ***    X          -0.0014039      0.0009466     -1.483      0.138     --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  (Dispersion parameter for binomial family taken to be 1)      Null deviance: 1101.1  on 1070  degrees of freedom Residual deviance: 1098.9  on 1069  degrees of freedom   (667 observations deleted due to missingness) AIC: 1102.9  Number of Fisher Scoring iterations: 4   exp(cbind(OR=coef(M1), confint(M1))) Waiting for profiling to be done...                                       OR           2.5 %       97.5 % (Intercept)                    5.1637680       3.3204509     8.155564      X                         0.9985971       0.9967357     1.000445 

I have 1738 total dataset, of which H is a dependent binomial variable. There are 19.95% fall in (H=0) category and remaining are in (H=1) category. Further this binomial dependent variable compare with the covariate X whose minimum value is 82.23, mean value is 223.8 and maximum value is 391.6. The 667 missing values correspond to the covariate X i.e 667 data for X is missing in the dataset out of 1738 data.


The question misinterprets the coefficients.

The software output shows that the log odds of the response don't depend appreciably on $X$, because its coefficient is small and not significant ($p=0.138$). Therefore the proportion of positive results in the data, equal to $100 – 19.95% approx 80%$, ought to have a log odds close to the intercept of $1.64$. Indeed,

$$logleft(frac{80%}{20%}right) = log(4) approx 1.4$$

is only about one standard error ($0.22$) away from the intercept. Everything looks consistent.

Detailed analysis

This generalized linear model supposes that the log odds of the response $H$ being $1$ when the independent variable $X$ has a particular value $x$ is some linear function of $x$,

$$text{Log odds}(H=1,|,X=x) = beta_0 + beta_1 x.tag{1}$$

The glm command in R estimated these unknown coefficients with values $$hatbeta_0 = 1.641666pm 0.2290133$$ and $$hatbeta_1 = -0.0014039pm 0.0009466.$$

The dataset contains a large number $n$ of observations with various values of $x$, written $x_i$ for $i=1, 2, ldots, n$, which range from $82.3$ to $391.6$ and average $bar x = 223.8$. Formula $(1)$ enables us to compute the estimated probabilities of each outcome, $Pr(H=1,|,X=x_i)$. If the model is any good, the average of those probabilities ought to be close to the average of the outcomes.

Since the odds are, by definition, the ratio of a probability to its complement, we can use simple algebra to find the estimated probabilities in terms of the log odds

$$widehatPr(H=1,|,X=x) = 1 – frac{1}{1 + expleft(hatbeta_0 + hatbeta_1 xright)}.$$

As a nonlinear function of $x$, that's difficult to average. However, provided $beta_1 x$ is small (much less than $1$ in size) and $1+exp(hatbeta_0)$ is not small (it exceeds $6$ in this case), we can safely use a linear approximation

$$frac{1}{1 + expleft(hatbeta_0 + hatbeta_1 xright)} = frac{1}{1 + exp(hatbeta_0)}left(1 – hatbeta_1 x frac{exp{hatbeta_0}}{1 + exp(hatbeta_0)}right) + Oleft(hatbeta_1 xright)^2.$$

Since the $x_i$ never exceed $391.6$, $|hatbeta_1 x_i|$ never exceeds $391.6times 0.0014039 approx 0.55$, so we're ok. Consequently, the average of the outcomes may be approximated as

$$eqalign{ frac{1}{n}sum_{i=1}^n widehatPr(H=1,|,X=x) &approx frac{1}{n}sum_{i=1}^n left(1 – frac{1}{1 + exp(hatbeta_0)}left(1 – hatbeta_1 x_i frac{exp{hatbeta_0}}{1 + exp(hatbeta_0)}right)right)\ &= 0.162238 + 0.000190814 bar{x} \ &= 20.4943%. }$$

Although that's not exactly equal to the $19.95%$ observed in the data, it is more than close enough, because $hatbeta_1$ has a relatively large standard error. For example, if $beta_1$ were increased by only $0.3$ of its standard error to $-0.0011271$, then the previous calculation would produce $19.95%$ exactly.

Similar Posts:

Rate this post

Leave a Comment