I have done survival analysis. I used Kaplan-Meir to do the survival analysis.

Description of data:

My data set is large and data table has close 120,000 records of survival information belong to 6 groups.

Sample:

` user_id time_in_days event total_likes total_para_length group 1: 2 4657 1 38867 431117212 AA 2: 2 3056 1 31392 948984460 BB 3: 2 49 1 15 67770 CC 4: 3 4181 1 15778 379211806 BB 5: 3 17 1 3 19032 CC 6: 3 2885 1 12001 106259666 EE `

After fitting the survival curves and plotting it, I see they are similar but yet at any given point in time their survival proportions don't seem to look like identical.

Here is the plot:

I ran a hypothesis test where my H0: There is not difference between the survival curves and here is the results that I got.

`> survdiff(formula= Surv(time, event) ~ group, rh=0) Call: survdiff(formula = Surv(time, event) ~ group, rho = 0) N Observed Expected (O-E)^2/E (O-E)^2/V group=FF 28310 27993 28632 14.3 19.0 group=AA 64732 63984 67853 220.6 460.1 group=BB 19017 18690 16839 203.4 245.6 group=CC 9687 9536 8699 80.6 91.0 group=DD 13438 13187 11891 141.3 164.2 group=EE 3910 3847 3324 82.4 89.7 Chisq= 788 on 5 degrees of freedom, p= 0 `

I am little confuse by trying to figure out what it means, specially since I got `p-value=0`

.

I am fairly new to survival analysis so after reading and digging through I realized that this is a non-parametric as I understand which means that it doesn't make any assumptions of the underline distributions of the time.

After reading about cox-proportional hazard function and going over c-cran pdf I performed a cox regression test and here is what I got from that:

`> cox_model <- coxph(Surv(time, event) ~ X) > summary(cox_model) Call: coxph(formula = Surv(time, event) ~ X) n= 139094, number of events= 137237 coef exp(coef) se(coef) z Pr(>|z|) X1 -7.655e-05 9.999e-01 1.504e-06 -50.897 <2e-16 *** X2 -1.649e-10 1.000e+00 5.715e-11 -2.886 0.0039 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 X1 0.9999 1 0.9999 0.9999 X2 1.0000 1 1.0000 1.0000 Concordance= 0.847 (se = 0.001 ) Rsquare= 0.111 (max possible= 1 ) Likelihood ratio test= 16307 on 2 df, p=0 Wald test = 7379 on 2 df, p=0 Score (logrank) test = 4628 on 2 df, p=0 `

My big X is generated by doing rbind on total_like and total_para_length. Looking at Rsquare and P-Values I am not sure what really is going on here. If I can't throw away the Null-Hypothesis I should give a larger p-value.

**Contents**hide

#### Best Answer

Your $p$-value is not actually zero, it's just very close to it. If you look at your test statistics ($chi^{2}=788$ in the Kaplan-Meier model, and the Wald $chi^{2}=7379$ in the CPH model) they are *ginormous!* Just really, *really* big. So the associated $p$-values are tiny.

But perhaps you wonder why, if your survival curves were so similar visually, you get such a significant difference with these tests? If so, consider: even tiny differences can obtain miniscule $p$-values if the sample size is large enough. And how big is your sample? It's about 120,000 observations: *big*! So you might expect that even very small differences will be found "significant."

What can you do with this inference, given that it is probably just telling you that you've a giant sample? I'm not sure, because I don't know if there's an equivalence test available for your two quantities. If there *is* an equivalence test for your estimates, you might (1) decide *a priori* what a *relevant difference* is (i.e. how large a difference between two groups needs to be, in order for you to care), (2) conduct a test of difference between your two groups, (3) conduct a test of *equivalence* using your definition of relevant difference, and (4) *combine* the inferences from these test approaches which will give you an idea of whether the significant difference you are finding is *relevant* (i.e. you rejected a test for difference, but did not reject a test for equivalence) or *trivial* (you rejected a test for difference and also rejected a test for equivalence).

### Similar Posts:

- Solved – p-value zero in hypothesis testing for survival curves
- Solved – p-value zero in hypothesis testing for survival curves
- Solved – how can I focus the log rank test in a selected period of time of follow up
- Solved – survival analysis using unbalanced sample
- Solved – r survival::survreg parameter estimation by formula