Firstly, please forgive my lack of statistics knowledge, but I am hoping someone can clear up my misunderstandings.
I am taking a sample of a physical quantity, (for example: temperature). To take this sample I average the measurements over a time duration (for example: 1 sec). This gives me a sample mean and a sample deviation.
I then repeat this measurement procedure under the same conditions. I then would have a sample mean and sample standard deviation for each repetition.
So my question is, while each sample has variation how do I "combine" these to describe the variation over all of the repetitions?
This seems rather fundamental, so please link any relevent references if you know of any.
Best Answer
It is not entirely clear what you are asking, and this makes it difficult to help with an answer. I will try to throw out several ideas and we can see if something sticks.
Let me change the setup a little bit because I have a hard time visualizing many temperature readings within 1 second, but you can change it back if you need–it's just a story to motivate the discussion. So, let's say you take a reading from a thermometer and write it down every 6 seconds. Thus you have 10 readings per minute, and you continue this procedure over the course of an hour. Now you could calculate the mean of each set of 10 readings as a single measure of the temperature during that minute. In addition, you could calculate:
- the Standard Deviation ($SD$) of the readings for each minute. The equation is $SD_m=sqrt{frac{Sigma_i(x_{im}-bar{x}_m)^2}{n_m-1}}$ Where $n_m$ is the number of measurements in the $m$th interval (which is always 10 in our story), the $x_{im}$ are the individual measurements and $bar{x}_m$ is the mean for that interval. This tells you how much the data are varying around your mean, just as you say. In the end you would have 60 of these.
- the pooled Standard Deviation ($SD_{pooled}$), of all of your measurements. The equation is $SD_{pooled}=sqrt{frac{Sigma_m(n_m-1)*SD_m^2}{(Sigma_mn_m)-m}}$ Thus, you multiply the number of measurements minus one by the estimated standard deviation squared for each interval, and then sum those. This is divided by the total number of data minus the number of means used (60 here), and the square root is taken of the quotient. The procedure outlined here is a more accurate estimate of measurement variability in your study, because it uses more data, and it is valid under the assumption that the true variability was constant.
- the Standard Error ($SE$), an estimate of how much the means would vary on repeated sampling. The equation is $SE=frac{SD_m}{sqrt{n_m}}$ This can be done for each interval in the study (which would give you 60 estimates of the SE).
- the Standard Deviation of your means ($SD_{bar{x}}$) from each minute. The equation is $SD_{bar{x}}=sqrt{frac{Sigma_m(bar{x}_m-bar{x}_.)^2}{m-1}}$ Where $bar{x}_.$ is the mean of all of your interval means. Since you have calculated many means (60), this is an empirical measure of how much they vary on repeated sampling. (Now things get a touch more complicated.) This is valid under the assumption that all of these means come from the same population distribution. In our example, these means are sampled over time. Thus, this approach is valid under the assumption that the system is stationary, which is typically not true of time-series data. For example, if you are sampling outside air temperature, that varies over the course of the day, so #4 would not be valid. On the other hand, if you are sampling inside air temperature, and you have an awesome heating / air conditioning system, maybe it could be.
I'm honestly not sure which of these you're asking about. From the question, you clearly understand #1, and from the comments, I gather you're familiar with #3. (Which, as you recognize, is related to the central limit theorem; specifically, if your data are normally distributed, the sampling distribution of the mean will also be normally distributed with a standard deviation estimated by #3.) When you ask about how to "combine" these, I'm guessing you're looking for either #2 or #4.
Similar Posts:
- Solved – Standard deviation vs standard error of sample mean
- Solved – Using a point estimate in confidence interval calculation
- Solved – Is it possible for a distribution to have known variance but unknown mean
- Solved – Standard Deviation (SD) vs. Coefficient of Variation (CV)
- Solved – How to combine standard deviations