Premise: not very clever in statistics!
Data: I have quantitative data (two variables, A and B) on two small groups of subjects (both N=7); I'm going to perform a T-test to check differences about those groups. Before it, I run a normality test on data A and B of first group, using Matlab lillietest function. The test says A is normal (H0 holds), B is not (H0 rejected).
Here are my data, followed by p values:
A B 0.000 0.125 1.500 0.125 2.375 1.125 2.375 0.125 5.625 0.250 4.250 0.000 0.750 0.000 p=0.37 p=0.008 H0=1 H0=0
I obtained similar results for second group. My conclusion is that I can perform t-test only for variable A, not for B.
Question 1) is my conclusion correct?
Question 2): given the very small dimension of sample size (n=7), how should I consider the normality test responses? Did I run a meaningful test? Is there a minimum sample size for it?
Best Answer
Elaboration on t.f's answer.
The normality test is a sneaky beast, because conceptually it works the other way round than a "normal" statistical test. Normally, you base your knowledge based on the rejection of the null. Here, the "desired" outcome ("proof" of normality) is the non-rejection. However, failure to reject is not the same as proving the null! The fact that cannot find an effect, does not mean it is not there.
With few samples, you will therefore never reject your hypothesis, so you are likely to falsely assume that your data is normal.
Conversely, if you have plenty of data, you will always reject your null, because no data in real world is perfectly normal. Consider human height – typically assumed, in biology, to have a normal distribution. In fact, it has been assumed to be normal for the past 150 years (ever since Galton). However, height has clear boundaries: it cannot be negative, it cannot be 100 meters. Therefore, it cannot be normally distributed.
You will find a more detailed discussion with numerous examples here.
So what can you do?
- Is there any reason to believe that your data is not normally distributed? Can you guess the distribution a priori? Typical examples may include bacterial growth or frequency of occurrence of an event.
- Use a q-q plot or a similar visual aid to make the decision.
- If forced by your thesis advisor to make a normality test, rather than focusing on the p-value alone, consider the effect size or calculate the skewness.
- Do you have similar data from other experiments? Can you use it to increase your sample size?