I am working with pbc dataset in R and would like to build 95% confidence interval for the comparison of two groups:
- 60-year-old males on DPCA with bilirubin = 1 mg/dL
- 40-year-old females on placebo with bilirubin = 0.5 mg/dL
In order to to do this I built Cox regression:
cox.adj = coxph(Surv(time, status) ~ trt + age.cat + bili + sex, data = data) summary(cox.adj) coef exp(coef) se(coef) z Pr(>|z|) trt1 0.10349 1.10904 0.18644 0.555 0.57882 age.cat 2. [42, 50) -0.01111 0.98895 0.30553 -0.036 0.97099 age.cat 3. [50, 57) 0.52052 1.68291 0.28611 1.819 0.06887 . age.cat 4. >= 57 0.87540 2.39983 0.28495 3.072 0.00213 ** bili 0.14951 1.16127 0.01343 11.136 < 2e-16 *** sexf -0.50976 0.60064 0.24207 -2.106 0.03522 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 trt1 1.1090 0.9017 0.7696 1.5982 age.cat 2. [42, 50) 0.9890 1.0112 0.5434 1.7999 age.cat 3. [50, 57) 1.6829 0.5942 0.9606 2.9485 age.cat 4. >= 57 2.3998 0.4167 1.3729 4.1950 bili 1.1613 0.8611 1.1311 1.1922 sexf 0.6006 1.6649 0.3737 0.9653
and made predictions:
data_test1 age.cat sex trt bili 1 4. >= 57 m 1 1.0 2 1. <42 f 0 0.5 prediction = predict(cox.adj, newdata = data_test1, se = T) prediction $fit 1 2 0.7007238 -0.8626783 $se.fit 1 2 0.2556641 0.2074737
Hazard ratio:
exp(0.7007238) / exp(-0.8626783) = 4.78
So, 60-year-old males on DPCA with bilirubin = 1.0 mg/dL have 4.78 times the hazard of mortality when compared to 40-year-old females on placebo with bilirubin = 0.5 mg/dL.
The question is how to calculate 95% cofidence interval for this comparison? My approach is:
exp(0.7007238 - 2 * 0.2556641) / exp(-0.8626783 - 2 * 0.2074737) exp(0.7007238 + 2 * 0.2556641) / exp(-0.8626783 + 2 * 0.2074737)
which gives me [4.336299, 5.258169] but I have information that 95% confidence interval probably should be about [2.3, 10]. So, my question is – are my calculations correct or not? If not, how should I calculate 95% confidence interval for this comparison?
Here are slides which I am trying to replicate in R:
Best Answer
You can't take the ratios of the individual hazard ratio CI limits to do this. Exponentiating the estimates based on the Cox model coefficients to get hazard ratios is the last thing that you should do when you are performing calculations with Cox models. You work in the scale of model coefficients, whose estimates are assumed to have multivariate normal distributions, until the end.
To get error estimates yourself, you must take into account both the point estimates of the Cox regression coefficients and the variance covariance matrix among the estimates. In R that matrix is returned by the vcov()
function applied to the model object. Apply the formula for the variance of sums of correlated variables to estimate the variance of the difference in linear predictors (log hazard ratio) between the conditions you have specified, and use that to predict the 95% CI around the point estimate of the difference. Only then do you exponentiate to get the CI in terms of hazard ratios.
If $x_1$ and $x_2$ are column vectors of predictor values that you want to compare, take the vector difference between them, $X = x_2 -x_1$. If $hat beta$ is the column vector of coefficient estimates, then the point estimate of the difference in log hazard ratio between the 2 situations is the inner product $X^T hat beta$. If $V$ is the variance-covariance matrix among the coefficient estimates, the variance of the estimated difference is the product $X^T V X$. Use that to get the 95% CI for the log hazard ratio difference, assuming a normal distribution. Finally you exponentiate to get the hazard ratio CI between the 2 situations.
Note that this can be done for you. For example, the rms package
in R has a contrast()
function that can do this for any comparisons of situations for a wide variety of regression models, including Cox models, and with a variety of other approaches to estimating CI.