### Context:

In a previous question, @Robbie asked in a study with around 600 cases why tests of normality suggested significant non-normality yet the plots suggested normal distributions. Several people made the point that significance tests of normality are not very useful. With small samples, such tests don't have much power to detect mild violations of normality and with large samples, they will detect violations of normality that are sufficiently small not to be of concern.

It seems to me that this problem is similar to the debate around significance testing and effect sizes. If you only focus on significance tests, when you have big samples, you can detect small effects that are irrelevant for practical purposes, and with small samples you don't have sufficient power.

In a few instances I've even seen textbooks advise people that you can have "too large" a sample, because small effects will be statistically significant.

In the context, of significance testing and effect sizes, one simple resolution is to focus on estimating the size of effect of interest, rather than being obsessed with the binary decision rule of whether there is or is not an effect. Confidence intervals on effect sizes is one such approach, or you could adopt some form of Bayesian approach. Furthermore, various research domains build up ideas about what a given effect size means in a practical sense, for better or worse, applying heuristic labels such as "small", "medium", and "large effect". This also leads to the intelligent recommendation of maximising sample size in order to maximise accuracy in estimating a given parameter of interest.

This makes me wonder why a similar approach based on confidence intervals of effect sizes is not more widely espoused in relation to assumption testing, and normality testing in particular.

### Question:

- What is the best single index of the degree to which the data violates normality?
- Or is it just better to talk about multiple indices of normality violation (e.g., skewness, kurtosis, outlier prevalence)?
- How can confidence intervals be calculated (or perhaps a Bayesian approach) for the index?
- What kind of verbal labels could you assign to points on that index to indicate the degree of violation of normality (e.g., mild, moderate, strong, extreme, etc.)?

The aim of such labels could be to assist analysts with less experience in training their intuition of when violations of normality are problematic.

#### Best Answer

*A) What is the best single index of the degree to which the data violates normality?*

*B) Or is it just better to talk about multiple indices of normality violation (e.g., skewness, kurtosis, outlier prevalence)?*

I would vote for B. Different violations have different consequences. For example, unimodal, symmetrical distributions with heavy tails make your CIs very wide and presumably reduce the power to detect any effects. The mean, however, still hits the "typical" value. For very skewed distributions, the mean for example, might not be a very sensible index of "the typical value".

*C) How can confidence intervals be calculated (or perhaps a Bayesian approach) for the index?*

I don't know about Bayesian statistics, but concerning classical test of normality, I'd like to cite Erceg-Hurn et al. (2008) [2]:

Another problem is that assumption tests have their own assumptions. Normality tests usually assume that data are homoscedastic; tests of homoscedasticity assume that data are normally distributed. If the normality and homoscedasticity assumptions are violated, the validity of the assumption tests can be seriously compromised. Prominent statisticians have described the assumption tests (e.g., Levene’s test, the Kolmogorov–Smirnov test) built into software such as SPSS as fatally flawed and recommended that these tests never be used (D’Agostino, 1986; Glass & Hopkins, 1996).

*D) What kind of verbal labels could you assign to points on that index to indicate the degree of violation of normality (e.g., mild, moderate, strong, extreme, etc.)?*

Micceri (1989) [1] did an analysis of 440 large scale data sets in psychology. He assessed the symmetry and the tail weight and defined criteria and labels. Labels for asymmetry range from 'relatively symmetric' to 'moderate –> extreme –> exponential asymmetry'. Labels for tail weight range from 'Uniform –> less than Gaussian –> About Gaussian –> Moderate –> Extreme –> Double exponential contamination'. Each classification is based on multiple, robust criteria.

He found, that from these 440 data sets only 28% were relatively symmetric, and only 15% were about Gaussian concerning tail weights. Therefore the nice title of the paper:

The unicorn, the normal curve, and other improbable creatures

I wrote an `R`

function, that automatically assesses Micceri's criteria and also prints out the labels:

`# This function prints out the Micceri-criteria for tail weight and symmetry of a distribution micceri <- function(x, plot=FALSE) { library(fBasics) QS <- (quantile(x, prob=c(.975, .95, .90)) - median(x)) / (quantile(x, prob=c(.75)) - median(x)) n <- length(x) x.s <- sort(x) U05 <- mean(x.s[(.95*n ):n]) L05 <- mean(x.s[1:(.05*n)]) U20 <- mean(x.s[(.80*n):n]) L20 <- mean(x.s[1:(.20*n)]) U50 <- mean(x.s[(.50*n):n]) L50 <- mean(x.s[1:(.50*n)]) M25 <- mean(x.s[(.375*n):(.625*n)]) Q <- (U05 - L05)/(U50 - L50) Q1 <- (U20 - L20)/(U50 - L50) Q2 <- (U05 - M25)/(M25 - L05) # mean/median interval QR <- quantile(x, prob=c(.25, .75)) # Interquartile range MM <- abs(mean(x) - median(x)) / (1.4807*(abs(QR[2] - QR[1])/2)) SKEW <- skewness(x) if (plot==TRUE) plot(density(x)) tail_weight <- round(c(QS, Q=Q, Q1=Q1), 2) symmetry <- round(c(Skewness=SKEW, MM=MM, Q2=Q2), 2) cat.tail <- matrix(c(1.9, 2.75, 3.05, 3.9, 4.3, 1.8, 2.3, 2.5, 2.8, 3.3, 1.6, 1.85, 1.93, 2, 2.3, 1.9, 2.5, 2.65, 2.73, 3.3, 1.6, 1.7, 1.8, 1.85, 1.93), ncol=5, nrow=5) cat.sym <- matrix(c(0.31, 0.71, 2, 0.05, 0.18, 0.37, 1.25, 1.75, 4.70), ncol=3, nrow=3) ts <- c() for (i in 1:5) {ts <- c(ts, sum(abs(tail_weight[i]) > cat.tail[,i]) + 1)} ss <- c() for (i in 1:3) {ss <- c(ss, sum(abs(symmetry[i]) > cat.sym[,i]) + 1)} tlabels <- c("Uniform", "Less than Gaussian", "About Gaussian", "Moderate contamination", "Extreme contamination", "Double exponential contamination") slabels <- c("Relatively symmetric", "Moderate asymmetry", "Extreme asymmetry", "Exponential asymmetry") cat("Tail weight indexes:n") print(tail_weight) cat(paste("nMicceri category:", tlabels[max(ts)],"n")) cat("nnAsymmetry indexes:n") print(symmetry) cat(paste("nMicceri category:", slabels[max(ss)])) tail.cat <- factor(max(ts), levels=1:length(tlabels), labels=tlabels, ordered=TRUE) sym.cat <- factor(max(ss), levels=1:length(slabels), labels=slabels, ordered=TRUE) invisible(list(tail_weight=tail_weight, symmetry=symmetry, tail.cat=tail.cat, sym.cat=sym.cat)) } `

Here's a test for the standard normal distribution, a $t$ with 8 df, and a log-normal:

`> micceri(rnorm(10000)) Tail weight indexes: 97.5% 95% 90% Q Q1 2.86 2.42 1.88 2.59 1.76 Micceri category: About Gaussian Asymmetry indexes: Skewness MM.75% Q2 0.01 0.00 1.00 Micceri category: Relatively symmetric > micceri(rt(10000, 8)) Tail weight indexes: 97.5% 95% 90% Q Q1 3.19 2.57 1.94 2.81 1.79 Micceri category: Extreme contamination Asymmetry indexes: Skewness MM.75% Q2 -0.03 0.00 0.98 Micceri category: Relatively symmetric > micceri(rlnorm(10000)) Tail weight indexes: 97.5% 95% 90% Q Q1 6.24 4.30 2.67 3.72 1.93 Micceri category: Double exponential contamination Asymmetry indexes: Skewness MM.75% Q2 5.28 0.59 8.37 Micceri category: Exponential asymmetry `

[1] Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. *Psychological Bulletin, 105*, 156-166. doi:10.1037/0033-2909.105.1.156

[2] Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. *American Psychologist, 63*, 591-601.

### Similar Posts:

- Solved – Using r, how might I check whether a collection of data points are normally distributed with skew
- Solved – How to understand boxplot skewness
- Solved – How to tell if the data distribution is symmetric
- Solved – Is the Shapiro Wilk test W an effect size
- Solved – External validation of clustering requires labels, but why cluster at all if you have labels