I am looking to fit a Cox proportional hazard survival model. Looking at the K-M curve (below) for one variable (with 2 categories) it appears there is a change in hazard ratios at around day 110. I was thinking of modeling this with a change-point model.
I'm having trouble implementing it. I have defined days_ind as 1 if days>=110 and 0 otherwise. Then I run the model:
coxph(Surv(time=days,event=event2)~x*days_ind)
I get several warning messages about convergence and the results don't seem to make any sense.
Am I approaching this in the correct way? I thought of bringing in the interaction of x*days instead but this too does not converge and also leads to strange estimates.
Best Answer
You need to make days_ind
a time-dependent variable. The way you have it coded right now, everybody whose observation (whether event or censoring time) was after 110 days will have experienced a different hazard throughout their entire followup then those whose observation is before 110 days. What you want to have is for the hazard to "jump" at 110 days.
It is not completely straightforward to set up this analysis in the survival
package. You have to split the follow-up period of each person into two periods: up to 110 days and after that. Anybody surviving beyond 110 days would have two observations: one right-censored at 110, and the other left-truncated at 110 and having the actual event on the right side. Fortunately, there is a function to do exactly that: survSplit
.
Here is a quick example with a built-in dataset:
> library(survival) > aml$id <- 1:nrow(aml) # add a subject ID variable > aml2 <- survSplit(aml,cut=10,end="time",start="start", event="status", episode="period") > > subset(aml, id<=3) time status x id 1 9 1 Maintained 1 2 13 1 Maintained 2 3 13 0 Maintained 3 > subset(aml2, id<=3) time status x id start period 1 9 1 Maintained 1 0 0 2 10 0 Maintained 2 0 0 3 10 0 Maintained 3 0 0 25 13 1 Maintained 2 10 1 26 13 0 Maintained 3 10 1
You can see that there are now two observations for id's 2 and 3. The period
variable corresponds to your days_ind
.
From here you can build the model you want, but you have to code the effects carefully, because the effect of period
cannot be estimated, since this it refers to different times.
> fit <- coxph(Surv(start, time, status) ~ I((x=="Maintained")&(period==0)) + I((x=="Maintained")&(period==1)), data=aml2) > fit Call: coxph(formula = Surv(start, time, status) ~ I((x == "Maintained") & (period == 0)) + I((x == "Maintained") & (period == 1)), data = aml2) coef exp(coef) se(coef) z p I((x == "Maintained") & (period == 0))TRUE -1.498 0.224 1.120 -1.34 0.18 I((x == "Maintained") & (period == 1))TRUE -0.722 0.486 0.591 -1.22 0.22 Likelihood ratio test=3.79 on 2 df, p=0.150 n= 41, number of events= 18
Here the two coefficients measure the effect of Maintained vs Non-maintained before 10 days and after 10 days, respectively.
You could also consider using the cmprsk
package. It is designed for analysis of competing risks, but there is nothing stopping you from using it for only one outcome. The benefit is that it has an easier way of defining time-dependent covariates (though a really awkward syntax overall):
> library(cmprsk) > fit1 <- with(aml, crr(time, status, cov1=I(x=="Maintained"), cov2=I(x=="Maintained"), + tf=function(t)I(t<=10))) > fit1 convergence: TRUE coefficients: I(x == "Maintained")1 I(x == "Maintained")1*tf1 -0.7213 -0.7387 standard errors: [1] 0.5259 1.1500 two-sided p-values: I(x == "Maintained")1 I(x == "Maintained")1*tf1 0.17 0.52
Note that with the different coding, the meaning of the coefficients is not exactly the same as above.
Similar Posts:
- Solved – How to manually calculate `survfit` in cox hazard model
- Solved – Cox baseline hazard
- Solved – Time-dependent coefficients in cox regression CPH (RMS)
- Solved – R: Survfit function – getting p value for a specified time period
- Solved – R: Survfit function – getting p value for a specified time period