I'm in a debate with a coworker and I'm starting to wonder if I'm wrong but the internet is confusing me more.

We have continuous data $[0, infty)$ that is retrospectively selected on individuals. The selection is non random. Our sample sizes are $approx 1000$. Our data is heavily skewed towards the left with some strong bumps towards the tail.

My strategy is to look at the distribution of the data before statistical tests between two groups via histograms, q-q plots, and Shapiro Wilk test. If the data is approximately normal I use an appropriate test (t-test, ANOVA, Linear Regression etc). If not I use an appropriate non-parametric method (Mann-Whitney Test, Kruskal-Wallis, Bootstrap regression model).

My coworker doesn't look at the distribution if the sample size is >30 or >50 he automatically assumes it is normal and cites the central limit theorem for using the t-test or ANOVA.

They cite this paper: t-tests, non-parametric tests, and large studies—a paradox of statistical practice? and say that I'm over-using non parametric tests. My understanding is my method would tell me if it's appropriate to do a normal distribution though because I thought that for heavy skewed data the n to reach ~normal distribution was higher. I know given a large enough sample size it would eventually get there but especially for the smaller sample sizes isn't it better to check? To me it makes sense that since multiple tests show that the data isn't normal it's inappropriate to use normal distribution then. Also if needing a sample size of 30 was all you needed for assuming normality why is so much work done on other distributions in statistical software? Everything would be normal distribution or non parametric then. Why bother with binomial distributions or gamma distributions? However they keep sending me papers about central limit theorem and now I'm not so sure. Maybe I am wrong and I shouldn't bother checking these assumptions.

Who is right and why?

**Contents**hide

#### Best Answer

My strategy is to look at the distribution of the data before statistical tests between two groups via histograms, q-q plots, and Shapiro Wilk test. If the data is approximately normal I use an appropriate test (t-test, ANOVA, Linear Regression etc). If not I use an appropriate non-parametric method (Mann-Whitney Test, Kruskal-Wallis, Bootstrap regression model).

What is 'approximately normal'? Do you *need* to pass a hypothesis test to be sufficiently approximate normal?

A problem is that those tests for normality are becoming more powerful (more likely to reject normality) when the sample size is increasing, and can even reject in the case of very small deviations. And ironically for larger sample sizes deviations from normality are less important.

My coworker doesn't look at the distribution if the sample is >30 or >50 he automatically assumes it is normal and cites the central limit theorem for using the t-test or ANOVA.

Can we ALWAYS assume normal distribution if n >30?

It is a bit strong to say 'always'. Also it is not correct to say that normality can be assumed (instead we can say that the impact of the deviation from normality can be negligible).

The problem that the article from Morten W Fagerland addresses is not whether the t-test works if n>30 (it does not work so well for n=30 which can also be seen in the graph, and it requires large numbers like their table which used sample size 1000). The problem is that a non-parametric test like Wilcoxon-Mann-Whitney (WMW) is not the right solution, and this is because WMW is answering a *different* question. The WMW test is *not* a test for equality of means or medians.

In the article it is not said to 'never' use WMW. Or to always use a t-test.

Is the WMW test a bad test? No, but it is not always an appropriate alternative to the t-test. The WMW test is most useful for the analysis of ordinal data and may also be used in smaller studies, under certain conditions, to compare means or medians.

Depending on the situation, a person might always use a t-test without analysing the normality, because of *experience* with distributions that might occur. Sure, one can think of examples/situations where t-tests in samples of 30 or 50 are a lot less powerful (too high p-values), but if you never deal with these examples then you can always use a t-test.

Something else.

If you have a sample size of 1000 then you might consider that not only the mean is important and you could look at more than just differences in means. In that case a WMW test is actually not a bad idea.