I am doing logistic regression in R on a binary dependent variable with only one independent variable. I found the odd ratio as 0.99 for an outcomes. This can be shown in following. Odds ratio is defined as, $ratio_{odds}(H) = frac{P(X=H)}{1-P(X=H)}$. As given earlier $ratio_{odds} (H) = 0.99$ which implies that $P(X=H) = 0.497$ which is close to 50% probability. This implies that the probability for having a H cases or non H cases 50% under the given condition of independent variable. This does not seem realistic from the data as only ~20% are found as H cases. Please give clarifications and proper explanations of this kind of cases in logistic regression.

I am hereby adding the results of my model output:

`M1 <- glm(H~X, data=data, family=binomial()) summary(M1) Call: glm(formula = H ~ X, family = binomial(), data = data) Deviance Residuals: Min 1Q Median 3Q Max -1.8563 0.6310 0.6790 0.7039 0.7608 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.6416666 0.2290133 7.168 7.59e-13 *** X -0.0014039 0.0009466 -1.483 0.138 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1101.1 on 1070 degrees of freedom Residual deviance: 1098.9 on 1069 degrees of freedom (667 observations deleted due to missingness) AIC: 1102.9 Number of Fisher Scoring iterations: 4 exp(cbind(OR=coef(M1), confint(M1))) Waiting for profiling to be done... OR 2.5 % 97.5 % (Intercept) 5.1637680 3.3204509 8.155564 X 0.9985971 0.9967357 1.000445 `

I have 1738 total dataset, of which H is a dependent binomial variable. There are 19.95% fall in (H=0) category and remaining are in (H=1) category. Further this binomial dependent variable compare with the covariate X whose minimum value is 82.23, mean value is 223.8 and maximum value is 391.6. The 667 missing values correspond to the covariate X i.e 667 data for X is missing in the dataset out of 1738 data.

#### Best Answer

### Summary

**The question misinterprets the coefficients.**

The software output shows that the log odds of the response don't depend appreciably on $X$, because its coefficient is small and not significant ($p=0.138$). Therefore the proportion of positive results in the data, equal to $100 – 19.95% approx 80%$, ought to have a log odds close to the intercept of $1.64$. Indeed,

$$logleft(frac{80%}{20%}right) = log(4) approx 1.4$$

is only about one standard error ($0.22$) away from the intercept. Everything looks consistent.

### Detailed analysis

This generalized linear model supposes that the log odds of the response $H$ being $1$ when the independent variable $X$ has a particular value $x$ is some linear function of $x$,

$$text{Log odds}(H=1,|,X=x) = beta_0 + beta_1 x.tag{1}$$

The `glm`

command in `R`

estimated these unknown coefficients with values $$hatbeta_0 = 1.641666pm 0.2290133$$ and $$hatbeta_1 = -0.0014039pm 0.0009466.$$

The dataset contains a large number $n$ of observations with various values of $x$, written $x_i$ for $i=1, 2, ldots, n$, which range from $82.3$ to $391.6$ and average $bar x = 223.8$. Formula $(1)$ enables us to compute the estimated probabilities of each outcome, $Pr(H=1,|,X=x_i)$. If the model is any good, *the average of those probabilities ought to be close to the average of the outcomes.*

Since the odds are, by definition, the ratio of a probability to its complement, we can use simple algebra to find the *estimated* probabilities in terms of the log odds

$$widehatPr(H=1,|,X=x) = 1 – frac{1}{1 + expleft(hatbeta_0 + hatbeta_1 xright)}.$$

As a nonlinear function of $x$, that's difficult to average. However, provided $beta_1 x$ is small (much less than $1$ in size) and $1+exp(hatbeta_0)$ is not small (it exceeds $6$ in this case), we can safely use a linear approximation

$$frac{1}{1 + expleft(hatbeta_0 + hatbeta_1 xright)} = frac{1}{1 + exp(hatbeta_0)}left(1 – hatbeta_1 x frac{exp{hatbeta_0}}{1 + exp(hatbeta_0)}right) + Oleft(hatbeta_1 xright)^2.$$

Since the $x_i$ never exceed $391.6$, $|hatbeta_1 x_i|$ never exceeds $391.6times 0.0014039 approx 0.55$, so we're ok. Consequently, the average of the outcomes may be approximated as

$$eqalign{ frac{1}{n}sum_{i=1}^n widehatPr(H=1,|,X=x) &approx frac{1}{n}sum_{i=1}^n left(1 – frac{1}{1 + exp(hatbeta_0)}left(1 – hatbeta_1 x_i frac{exp{hatbeta_0}}{1 + exp(hatbeta_0)}right)right)\ &= 0.162238 + 0.000190814 bar{x} \ &= 20.4943%. }$$

Although that's not exactly equal to the $19.95%$ observed in the data, it is more than close enough, *because $hatbeta_1$ has a relatively large standard error.* For example, if $beta_1$ were increased by only $0.3$ of its standard error to $-0.0011271$, then the previous calculation would produce $19.95%$ *exactly*.

### Similar Posts:

- Solved – Converting logistic regression coefficient and confidence interval from log-odds scale to probability scale
- Solved – How to calculate Odds ratio and 95% confidence interval for logistic regression for the following data
- Solved – Interpreting logistic regression coefficients with a regularization term
- Solved – Interpreting odds ratios for logistic regression with intercept removed
- Solved – Interpreting odds ratios for logistic regression with intercept removed