Solved – Big Sample size, Small coefficients, significant results. How to do

I did some quantitative research and I used Rank-Order logistic regression in Stata. The the independent variables have almost 0 p-value which shows they have significant effect on dependent variable. But, since the sample size is big (35000 records) and coefficients are so small (e.g. 0.0001) then it shows that there is no relationship because when sample size is so big everything can get significant.
I tested the model with only 5000 records as well and I got the significant result as well.
What do you recommend me to do? should I use small sample size then the reviewers of my paper will not point to the problem of big sample size… or is there any other way that I can report my results and show that in fact the variables have significant effect?
I will appreciate any help.

I think it's been asked before. It's useful to realize that, without a prespecified sample size and alpha level, the $p$-value is just a measure of the sample size you ultimately wind up with. Not appealing. An approach I use is this: at what sample size would a 0.05 level be appropriate? Scale accordingly. For instance, I feel the 0.05 level is often suited to problems where there are 100 observations. That is: I would say WOW that is an interesting finding if it had a 1/20 chance of being a false positive. So if you had a sample size of 5,000, that's 50 times larger than 100. So divide your 0.05 level by 50 and come up with 0.001 as a significance level. This is in line with what Fisher advocated: don't do significance testing with p-values, compare them to the power of the study. The sample size is the simplest/rawest measure of the study's power. An overpowered study with a conventional 0.05 cut off makes absolutely no sense.

Usually, it is never advisable to choose a significance cut-off after viewing data and results. One might believe it might be kosher to arbitrarily choose a more stringent significance criterion post hoc. Actually, it only deceives readers into thinking you ran a better controlled trial than you did. Think of it this way: if you observed p = 0.04, you wouldn't be asking this question; the analysis would be a tidy inferential package.

Another way to look at it is this: just report the CI and that the analysis was statistically significant. For instance, you might have a 95% CI for a hazard ratio that goes from (0.01, 0.16) – the null is 1. It suffices to say that the p-value is really freakin' small, so you don't need to clutter the page displaying p=0.0000000023 (don't do this… only show p out to its precision, if 3 decimal places show p < 0.001 and never round to 0.000 – that shows you don't know what a p-value means.).

Similar Posts:

Rate this post

Leave a Comment