I have done survival analysis. I used Kaplan-Meir to do the survival analysis.
Description of data:
My data set is large and data table has close 120,000 records of survival information belong to 6 groups.
Sample:
user_id time_in_days event total_likes total_para_length group 1: 2 4657 1 38867 431117212 AA 2: 2 3056 1 31392 948984460 BB 3: 2 49 1 15 67770 CC 4: 3 4181 1 15778 379211806 BB 5: 3 17 1 3 19032 CC 6: 3 2885 1 12001 106259666 EE
After fitting the survival curves and plotting it, I see they are similar but yet at any given point in time their survival proportions don't seem to look like identical.
Here is the plot:
I ran a hypothesis test where my H0: There is not difference between the survival curves and here is the results that I got.
> survdiff(formula= Surv(time, event) ~ group, rh=0) Call: survdiff(formula = Surv(time, event) ~ group, rho = 0) N Observed Expected (O-E)^2/E (O-E)^2/V group=FF 28310 27993 28632 14.3 19.0 group=AA 64732 63984 67853 220.6 460.1 group=BB 19017 18690 16839 203.4 245.6 group=CC 9687 9536 8699 80.6 91.0 group=DD 13438 13187 11891 141.3 164.2 group=EE 3910 3847 3324 82.4 89.7 Chisq= 788 on 5 degrees of freedom, p= 0
I am little confuse by trying to figure out what it means, specially since I got p-value=0
.
I am fairly new to survival analysis so after reading and digging through I realized that this is a non-parametric as I understand which means that it doesn't make any assumptions of the underline distributions of the time.
After reading about cox-proportional hazard function and going over c-cran pdf I performed a cox regression test and here is what I got from that:
> cox_model <- coxph(Surv(time, event) ~ X) > summary(cox_model) Call: coxph(formula = Surv(time, event) ~ X) n= 139094, number of events= 137237 coef exp(coef) se(coef) z Pr(>|z|) X1 -7.655e-05 9.999e-01 1.504e-06 -50.897 <2e-16 *** X2 -1.649e-10 1.000e+00 5.715e-11 -2.886 0.0039 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 X1 0.9999 1 0.9999 0.9999 X2 1.0000 1 1.0000 1.0000 Concordance= 0.847 (se = 0.001 ) Rsquare= 0.111 (max possible= 1 ) Likelihood ratio test= 16307 on 2 df, p=0 Wald test = 7379 on 2 df, p=0 Score (logrank) test = 4628 on 2 df, p=0
My big X is generated by doing rbind on total_like and total_para_length. Looking at Rsquare and P-Values I am not sure what really is going on here. If I can't throw away the Null-Hypothesis I should give a larger p-value.
Best Answer
Your $p$-value is not actually zero, it's just very close to it. If you look at your test statistics ($chi^{2}=788$ in the Kaplan-Meier model, and the Wald $chi^{2}=7379$ in the CPH model) they are ginormous! Just really, really big. So the associated $p$-values are tiny.
But perhaps you wonder why, if your survival curves were so similar visually, you get such a significant difference with these tests? If so, consider: even tiny differences can obtain miniscule $p$-values if the sample size is large enough. And how big is your sample? It's about 120,000 observations: big! So you might expect that even very small differences will be found "significant."
What can you do with this inference, given that it is probably just telling you that you've a giant sample? I'm not sure, because I don't know if there's an equivalence test available for your two quantities. If there is an equivalence test for your estimates, you might (1) decide a priori what a relevant difference is (i.e. how large a difference between two groups needs to be, in order for you to care), (2) conduct a test of difference between your two groups, (3) conduct a test of equivalence using your definition of relevant difference, and (4) combine the inferences from these test approaches which will give you an idea of whether the significant difference you are finding is relevant (i.e. you rejected a test for difference, but did not reject a test for equivalence) or trivial (you rejected a test for difference and also rejected a test for equivalence).
Similar Posts:
- Solved – p-value zero in hypothesis testing for survival curves
- Solved – p-value zero in hypothesis testing for survival curves
- Solved – how can I focus the log rank test in a selected period of time of follow up
- Solved – survival analysis using unbalanced sample
- Solved – r survival::survreg parameter estimation by formula