Applied linear statistical models by Kutner et al. states the following concerning departures from the normality assumption of ANOVA models: *Kurtosis of the error distribution (either more or less peaked than a normal distribution) is more important than skewness of the distribution in terms of the effects on inferences*.

I'm a bit puzzled by this statement and did not manage to find any related information, either in the book or online. I'm confused because I also learned that QQ-plots with heavy tails are an indication that the normality assumption is "good enough" for linear regression models, whereas skewed QQ-plots are more of a concern (i.e. a transformation might be appropriate).

Am I correct that the same reasoning goes for ANOVA and that their choice of words (*more important in terms of the effects on inferences*) was just chosen poorly? I.e. a skewed distribution has more severe consequences and should be avoided, whereas a small amount of kurtosis can be acceptable.

EDIT: As adressed by rolando2, it's hard to state that one is more important than the other in all cases, but I'm merely looking for some general insight. My main issue is that I was taught that in simple linear regression, QQ-plots with heavier tails (=kurtosis?) are OK, since the F-test is quite robust against this. On the other hand, skewed QQ-plots (parabola-shaped) are usually a bigger concern. This seems to go directly against the guidelines my textbook provides for ANOVA, even though ANOVA models can be converted to regression models and should have the same assumptions.

I'm convinced I'm overlooking something or I have a false assumption, but I cannot figure out what it might be.

**Contents**hide

#### Best Answer

The difficulty is that skewness and kurtosis are dependent; their effects can't be completely separated.

The problem is that if you want to examine the effect of a highly skew distribution, you *must* also have a distribution with high kurtosis.

In particular, kurtosis* $geq$ skewness$^2+1$.

* (ordinary scaled fourth moment kurtosis, not excess kurtosis)

Khan and Rayner (which is mentioned in the earlier answer) work with a family that allows some exploration of the impact of skewness and kurtosis, but they cannot avoid this issue, so their attempt to separate them severely limits the extent to which the effect of skewness can be explored.

If one holds the kurtosis ($beta_2$) constant, one cannot make the skewness more than $sqrt{beta_2-1}$. If one wishes to consider unimodal distributions, the skewness is even more restricted.

For example, if you want to see the effect of high skewness – say skewness > 5, you *cannot* get a distribution with kurtosis less than 26!

So if you want to investigate the impact of high skewness, you are unable to avoid investigating the impact of high kurtosis. Consequently if you do try to separate them, you in effect hold yourself unable to assess the effect of increasing skewness to high levels.

That said, at least for the distribution family they considered, and within the limits that the relationship between them poses, the investigation by Khan and Rayner does seem to suggest that kurtosis is the main problem.

However, even if the conclusion is completely general, if you happen to have a distribution with (say) skewness 5, it's likely to be little comfort to say "but it's not the skewness that's the problem!" — once your skewness is $>sqrt{2}$, you can't get a kurtosis to be that of the normal, and beyond that, minimum possible kurtosis grows rapidly with increasing skewness.