# Solved – the correct SD to use to get a 95% CI for skewed data

Let \$X = [94, 10, 100, 100, 16, 14, 100, 100, 70, 88, 100, 100, 12, 100, 100, 58, 32, 100, 32, 36, 98, 0, 100, 100, 100]\$

where \$X\$ are students' scores (between 0 and 100), and note many full marks!

The Question is what statistics will best describe the data (note data is non-Gaussian)

Option 1

If I fit a Gaussian using maximum likelihood I will get

sample mean = 70.4, and SD = 37.96, so a mean +/- 1 SD gives an interval from 32.43 to 108.36.

Finally, If I fit a Gaussian to the data \$X\$ using `normfit` in `matlab(R)` and obtain a 95% confidence bound on the mean and standard deviation I will get
\$\$
begin{aligned}
mu &= 70.4 ; &CI_{95%} = [54.73, 86.06] \
sigma &= 37.96 ; &CI_{95%} = [29.64, 52.80]
end{aligned}
\$\$
Option 2

On the other hand, what if I use left / right SD instead? I.e., to report two SD values, SD_left and SD_right where:
\$\$
begin{aligned}
SD_{left} &= sqrt{frac{1}{N_{left}} * sum(X*I(X<mu) – mu)^2} &= 49.94 \
SD_{right} &= sqrt{frac{1}{N_{right}} * sum(X*I(Xgemu) – mu)^2} &= 29.45
end{aligned}
\$\$

where \$mu=70.4\$ is the mean, \$N_{left} = sum(I(X<mu)) – 1 = 9\$ is number of samples less than the mean (minus 1 to remove bias) and \$I\$ is the indicator function which gives 1 if its argument is true or else 0; \$N_{right} = sum(I(Xgemu)) – 1 = 14\$

In this case the interval around the mean is [20.46, 99.85], instead of the previous result, [32.43, 108.36].

Which one shall I go for, 1 or 2?

Contents