Let's say I measured the weights of 50 chickens from my family farm, which keeps 1000 chickens. The sample mean is 5 kg, SEM is ± 3 kg, and the 95% confidence interval is 5 ± 3 * 1.96 = -0.88 kg to 10.88 kg. How should I interpret the results of SEM and CI. Obviously the weight of a chicken should not be negative.
- It seems to me SEM has little use except to calculate CI? What quantitative information can we derive from SEM? We can say the true mean weight of the 1000 chickens is likely (very qualitative) to fall between 2 kg to 8 kg (sample mean ± SEM), but do we know the probability?
- How to interpret the negative lower bound of CI?
- How much probability that the true mean weight will fall in the range between 0 kg – 10.88 kg?
It seems to me SEM has little use except to calculate CI? What quantitative information can we derive from SEM? We can say the true mean weight of the 1000 chickens is likely (very qualitative) to fall between 2 kg to 8 kg (sample mean ± SEM), but do we know the probability?
Let us begin with an observation. The SEM is not a descriptive statistic. It is derived from the data. It informs you about the sampling error of the statistic but not the uncertainty in the population. It is an artifact of the measurement process.
Had you chosen a different measurement, such as the median, you would have had different standard errors. Likewise, had your model been different, you would have had different standard errors.
There is an infinite number of possible confidence interval functions. You are using the standard one from a textbook, but it is not the only one. It is a model that has desirable properties, so it is taught, but there could be a different interval if you chose to formally model losses you would obtain from getting a bad sample.
The SEM is providing sample-specific information. For the purposes of your question, its only use is as an interim step in a calculation.
Confidence intervals tell you the area you have confidence in for the location of the mean (or some other statistic). Confidence intervals tell you nothing about the distribution of the sizes of the chickens themselves.
The interval you may want is the tolerance interval. If you wanted to know the range where 95% of your population of chickens is likely to fall, then you want the 95% tolerance interval and not the 95% confidence interval.
How to interpret the negative lower bound of CI?
The bounds of a confidence interval have no interpretation. They are random numbers. A function that generates an interval is an $alpha$ percent confidence interval if, upon infinite repetition, the interval would cover the true value of the parameter at least $alpha$ percent of the time.
If you create an $alpha$ percent confidence interval and it is $[a,b]$ then the interpretation is that if you behave as if the true value were inside that range then you would be made a fool of less than $alpha$ percent of the time once repetitions became very large.
A negative bound is fine. Let's imagine that we are Mother Nature, and you know the true population mean is at 4 kg. You should be delighted then because the interval $[-.88,10.88]$ contains the actual value. The lower bound is indeed non-sense, but Frequentist methods allow non-sense answers as long as it covers the true value a certain percentage of the time.
Also, note that narrow intervals are not better than wider ones. Narrow ones are not more accurate than wide ones. They are equally precise in that they cover the true value at least a fixed percentage of the time on large repetition.
To see why, imagine that you divided the population of chickens in half randomly and weighed them. One-half of the chickens had a narrower interval than the other half. What about the randomization process made one group more accurate? Nothing.
How much probability that the true mean weight will fall in the range between 0 kg – 10.88 kg?
That is a model-specific question. I would be concerned that your data is not normally distributed. While they are probably normally distributed, given roughly equal ages and diets, the population contains chicks and very old chickens. I would be surprised to find that they were normally distributed on an uncontrolled basis.
However, if we pretend that the chickens are sufficiently similar to each other to be normally distributed, then we can start to address your question.
First, a confidence interval is not a statement of probability. If you want a probability, then you will need to use a Bayesian model. A Bayesian credible interval will tell you the probability that a parameter is inside some range. Frequentist methods will not do that.
The reason is that there is either a 100% or a 0% chance that the parameter is inside the range, in Frequentist thinking. In Frequentist thinking, you cannot make a probability statement about a fact.
George Washington either was the first President, or he was not. That is a factual question and not subject to probability statements. A Frequentist cannot say, "it is probably raining." A Bayesian can. It is either raining, or it is not. The parameter is either inside the range, or it is not.
What you can say is that you have 95% confidence that the interval covers the parameter. What you cannot say is that there is a 95% chance that the parameter is inside the interval. That is not true.
What you have confidence in is the procedure and not the data. Your data is a random collection. There is supposed to be nothing special about it. As such, your interval and sample mean are random too. There is nothing special about them either. The population parameter, $mu$, is special. What makes a sample mean or a confidence interval special in any sense is their relationship to $mu$.
They summarize the information you have gathered about $mu$ but are not $mu$. The procedure gives you guarantees, if your model is valid, about how often you will make incorrect decisions and take incorrect actions based on the sample that you saw.
Even tolerance intervals require you to state how often you want to be made a fool of. There is no absolute tolerance interval; there are only intervals given $alpha$, the data, and the model.
- Solved – Confidence Interval for Regression Line (simple linear regression)
- Solved – Is 95% specific to the confidence interval in any way
- Solved – Is 95% specific to the confidence interval in any way
- Solved – Using a point estimate in confidence interval calculation
- Solved – Are these statements about Confidence Interval correct