Solved – Calculate Harrell’s C-index from random survival forest

I'm fitting a random survival forest in R using the ranger package, and I'm curious about how the OOB error rate is calculated. According to the documentation it is calculated as one minus Harrell's C-index. However, calculating the C-index requires two components: the actual survival times, and the predicted survival times (or at the very least a rank-ordering of my observations by expected survival time).

I have the actual survival times. But how do I get the predicted survival times, or a rank-ordering of my observations by expected survival time? The random forest itself returns estimated hazard and survival functions. My first thought was to turn each estimated survival function into a pmf of survival time, and calculate the expected survival time from that. However in practice this seems intractable, at least without making non-trivial assumptions, because the survival functions are usually truncated (e.g. cumulative probability of survival never hits 0 but instead is cut off around 0.2 or some other non-trivial probability).

Is there some other way of comparing (truncated) survival or hazard functions to create a ranking of estimated survival times?

Admittedly my question is essentially the same as this one but I don't think there is an adequate response there either.

I believe I found the answer I'm looking for here (equation 5, under Prediction Error).

The statistic used to compare outcomes is $mathcal{M}_i=sum_{k=1}^{M}hat{H_e}(t_k|X_i)$ where $t_1,…,t_M$ are the unique times in the data, $hat{H_e}$ is the cumulative hazard estimate, and $X_i$ is the vector of covariates for observation $i$.

Similar Posts:

Rate this post

Leave a Comment