Solved – Size of the data set for measuring variation

This might be a simple question but I am unable to find a solution on the web. I would like to measure variation of some numerical data. What is the minimum number of values I should use for measuring their standard deviation or coefficient of variation? Basically my plan is to derive groups from a set of performance metric values (all are positive integers). I would like to choose "variation" measure for grouping. My idea is as follows: For example, I have 16 positive integer numbers. First I would like to sort them then divide them as four groups. Each group contains 4 integers and I will calculate variation in terms of either standard deviation or coefficient of variation. If the difference in the variation of two groups is less than a threshold then I will merge them, and etc. Here my question is, can I take a data size of 4 for calculating variation? Please help me.

In the book Statistical Rules of Thumb (explained by Dan Goldstein here), Gerald van Belle describes how the width of a Student's t confidence interval decreases with more observations. Here is the useful chart:

confidence interval asymptote

His "rule of thumb" is to gather at least 12 data points for a given sample. But as you can see from the chart, this is not strictly necessary. I don't fully understand your question, but the answer is that you can get a measure of the variation from 4 data points. In fact, you could get one from 3.

However, the more data you include in each group, the more accurate your procedure will be. So, if you have the option, you might consider gathering enough data to get 6 observations per group, or splitting the data into only 3 groups instead of 4 (with the odd 6th datum assigned randomly to one of the them).

Similar Posts:

Rate this post

Leave a Comment