In this case study I have to assume a baseline Weibull distribution, and I'm fitting an Accelerated Failure Time model, which will be interpreted by me later on regarding both hazard ratio and survival time.
The data looks like this.
head(data1.1) TimeSurv IndSurv Treat Age 1 6 days 1 D 27 2 33 days 1 D 43 3 361 days 1 I 36 4 488 days 1 I 54 5 350 days 1 D 49 6 721 days 1 I 49 7 1848 days 0 D 32 8 205 days 1 D 47 9 831 days 1 I 24 10 260 days 1 I 38
I'm fitting a model using the function Weibullreg() in R. The survival function is built reading TimeSurv as the time measures and IndSurv as the indicator of censoring. The covariates considered are Treat and Age.
My issue deals with understanding the output properly:
wei1 = WeibullReg(Surv(TimeSurv, IndSurv) ~ Treat + Age, data=data1.1) wei1 $formula Surv(TimeSurv, IndSurv) ~ Treat + Age $coef Estimate SE lambda 0.0009219183 0.0006803664 gamma 0.9843411517 0.0931305471 TreatI -0.5042111027 0.2303038312 Age 0.0180225253 0.0089632209 $HR HR LB UB TreatI 0.6039819 0.384582 0.948547 Age 1.0181859 1.000455 1.036231 $ETR ETR LB UB TreatI 1.6690124 1.0574337 2.6343045 Age 0.9818574 0.9644488 0.9995801 $summary Call: survival::survreg(formula = formula, data = data, dist = "weibull") Value Std. Error z p (Intercept) 7.10024 0.41283 17.20 <2e-16 TreatI 0.51223 0.23285 2.20 0.028 Age -0.01831 0.00913 -2.01 0.045 Log(scale) 0.01578 0.09461 0.17 0.868 Scale= 1.02 Weibull distribution Loglik(model)= -599.1 Loglik(intercept only)= -604.1 Chisq= 9.92 on 2 degrees of freedom, p= 0.007 Number of Newton-Raphson Iterations: 5 n= 120
I don't really get how Scale = 1.02 and log(scale) = 0.015, and if the p-value of this log(scale) is a big non-signfificant one, from how the documentation of the function shows some conversions it makes, am I to assume that the values of the alphas are also not to be trusted (considering they were reached using the scale value)?
Best Answer
Many (including me) get confused by the different ways to define the parameters of a Weibull distribution, particularly since the standard R Weibull-related functions in the stats
package and the survreg()
parametric fitting function in the survival
package use different parameterizations.
The manual page for the R Weibull-related functions in stats
says:
The Weibull distribution with shape parameter $a$ and scale parameter $b$ has density given by $$frac{a}{b}left(frac{x}{b}right)^{a-1}e^{-(x/b)^{a}}$$ for $x$ > 0.
That's called the "standard parameterization" on the Wikipedia page (where they use $k$ for shape and $lambda$ for scale).
The survreg()
function uses a different parameterization, with differences explained on its manual page:
There are multiple ways to parameterize a Weibull distribution. The survreg function embeds it in a general location-scale family, which is a different parameterization than the rweibull function, and often leads to confusion.
survreg's scale = 1/(rweibull shape)
survreg's intercept = log(rweibull scale).
The WeibullReg()
function effectively takes the result from survreg()
and expresses the results in terms of the "standard parameterization."
There is a potential confusion, however, as the $summary
of the object produced by WeibullReg
is "the summary table from the original survreg model." (Emphasis added.) So what you have displayed in the question includes results for both parameterizations.
That dual representation of the results helps explain what's going on.
Starting from the bottom, the survreg
value of scale
is the reciprocal of the "standard parameterization" value of shape
. The "standard" shape parameter is called gamma
in the WeibullReg
$formula
output near the top of your output. The value for gamma
is 0.98434, with a reciprocal of 1.0159, rounding to the value of 1.02 shown as Scale
in the last line of your output. The natural logarithm of 1.0159 is 0.01578, shown as Log(scale)
in the next-to-last line. Those last lines of your output, remember, are based on the survreg
definition of scale
.
The p-value for that Log(scale)
is indeed very high. But that just means that the value of Log(scale)
is not significantly different from 0, or that the scale
itself (as defined in survreg
) is not different from 1. That has nothing to do with the hazard ratios and so forth for the covariates. It just means that the baseline survival curve of your Weibull model can't be statistically distinguished from a simple exponential survival curve, which would have exactly a value of 1 for survreg
scale
or "standard" shape
and a constant baseline hazard over time. So there is nothing to distrust about your results on that basis.