Solved – Change-point in Cox survival model

I am looking to fit a Cox proportional hazard survival model. Looking at the K-M curve (below) for one variable (with 2 categories) it appears there is a change in hazard ratios at around day 110. I was thinking of modeling this with a change-point model.


I'm having trouble implementing it. I have defined days_ind as 1 if days>=110 and 0 otherwise. Then I run the model:


I get several warning messages about convergence and the results don't seem to make any sense.

Am I approaching this in the correct way? I thought of bringing in the interaction of x*days instead but this too does not converge and also leads to strange estimates.

You need to make days_ind a time-dependent variable. The way you have it coded right now, everybody whose observation (whether event or censoring time) was after 110 days will have experienced a different hazard throughout their entire followup then those whose observation is before 110 days. What you want to have is for the hazard to "jump" at 110 days.

It is not completely straightforward to set up this analysis in the survival package. You have to split the follow-up period of each person into two periods: up to 110 days and after that. Anybody surviving beyond 110 days would have two observations: one right-censored at 110, and the other left-truncated at 110 and having the actual event on the right side. Fortunately, there is a function to do exactly that: survSplit.

Here is a quick example with a built-in dataset:

> library(survival) > aml$id <- 1:nrow(aml)  # add a subject ID variable > aml2 <- survSplit(aml,cut=10,end="time",start="start", event="status", episode="period") >  > subset(aml, id<=3)   time status          x id 1    9      1 Maintained  1 2   13      1 Maintained  2 3   13      0 Maintained  3 > subset(aml2, id<=3)    time status          x id start period 1     9      1 Maintained  1     0      0 2    10      0 Maintained  2     0      0 3    10      0 Maintained  3     0      0 25   13      1 Maintained  2    10      1 26   13      0 Maintained  3    10      1 

You can see that there are now two observations for id's 2 and 3. The period variable corresponds to your days_ind.

From here you can build the model you want, but you have to code the effects carefully, because the effect of period cannot be estimated, since this it refers to different times.

> fit <- coxph(Surv(start, time, status) ~     I((x=="Maintained")&(period==0)) + I((x=="Maintained")&(period==1)), data=aml2) > fit Call: coxph(formula = Surv(start, time, status) ~ I((x == "Maintained") &      (period == 0)) + I((x == "Maintained") & (period == 1)),      data = aml2)                                                coef exp(coef) se(coef)     z    p I((x == "Maintained") & (period == 0))TRUE -1.498     0.224    1.120 -1.34 0.18 I((x == "Maintained") & (period == 1))TRUE -0.722     0.486    0.591 -1.22 0.22  Likelihood ratio test=3.79  on 2 df, p=0.150  n= 41, number of events= 18  

Here the two coefficients measure the effect of Maintained vs Non-maintained before 10 days and after 10 days, respectively.

You could also consider using the cmprsk package. It is designed for analysis of competing risks, but there is nothing stopping you from using it for only one outcome. The benefit is that it has an easier way of defining time-dependent covariates (though a really awkward syntax overall):

> library(cmprsk) > fit1 <- with(aml, crr(time, status, cov1=I(x=="Maintained"), cov2=I(x=="Maintained"),  +                      tf=function(t)I(t<=10))) > fit1 convergence:  TRUE  coefficients:     I(x == "Maintained")1 I(x == "Maintained")1*tf1                    -0.7213                   -0.7387  standard errors: [1] 0.5259 1.1500 two-sided p-values:     I(x == "Maintained")1 I(x == "Maintained")1*tf1                       0.17                      0.52  

Note that with the different coding, the meaning of the coefficients is not exactly the same as above.

Similar Posts:

Rate this post

Leave a Comment