Solved – Not able to understand the intuition behind $z$-test, $t$-test

I am not able to understand and appreciate the basic intuition behind need of this whole hypothesis testing framework, different kinds of tests, looking up tables, significance level etc etc. Why can't below simple thing work?. Please let me know what is missing from my understanding:

Say we want to test if a particular thing roughly (within acceptable limit) holds true on a population or not, say content weight in packets of some product. Let's say expected weight is $w_0$. We can take as large a sample n as possible and find weight in each packet. If n1 packets weigh close to $w_0$ and n-n1 does not weigh close to $w_0$ then if n1/n is large enough, then we can say that the properly holds(i.e. weights of packets are within acceptable limits). Now, where is the scope of this whole hypothesis testing framework, consulting tables, graph plot etc etc.

Say we want to test if a particular thing holds true on a population or not,

This does sound like the sort of thing people tend to use hypothesis testing for.

say mean of content weights in packets of some products

Well, in that particular kind of situation, you might be interested in a slightly different question than the usual hypothesis tests will tend to answer.

We can take as large a sample as possible and find mean of weight in that; Let's call it $w$.

Well let's be more careful to emphasize the distinction between random variable, observed values and hypothesized population parameters. Statisticians have a standard notation partly to make that clear.

Let the hypothesized mean weight in the population be $mu_0$. Let the random variable representing the mean sample weight be $X$ and let the observed sample weight in our particular sample be $bar{x}$.

Then error is $∣w−w_0∣/w$.

i.e. the statistic you're suggesting is $|bar{X}-mu_0|/bar{X}$, and its observed value in our particular sample is $|bar{x}-mu_0|/bar{x}$.

That's the absolute relative error, but for some reason you've computed the error relative to the sample mean (a quantity subject to sampling noise at the least) rather than the thing you're interested in $mu_0$, which is not subject to noise, systematic error or anything else. Which is to say, if you're going to construct a statistic like that, $|bar{x}-mu_0|/mu_0$ might often be a more obvious choice.

If this error is within acceptable limits,

Here's where things seem to be diverging.

If "acceptable" is based on some external standard (like "is my sample mean within 1% of $mu_0$") where 1% has been judged by you to be close, you now appear to be answering a different question to "we want to test if a particular thing holds true on a population or not, … mean of content weights in packets of some products"

That's not saying the question you're now addressing would be a bad one to answer, but if I understand you correctly, it's not quite the same question you started with.

then we can say that the property holds.

Well no, you can't. You have no basis on which to say the population mean is $mu_0$. Only that the sample mean happened in this instance to be close (in a particular sense) to it. Let's say your criterion of acceptability is the absolute error is under 1% and you observe a sample mean that's 0.8% above that hypothesized mean. Further, let's say that the actual population mean is a little above the population mean (maybe it's 0.68% above, say, but it doesn't matter, because the sample mean tells us our best estimate of it). In a sufficiently large sample you will be able to tell that the population mean is different from $mu_0$ (if the error of 0.8% is larger than you could reasonably expect to see, given random sampling from a population with mean $mu_0$).

Now, where is the scope of this whole hypothesis testing framework

Answering instead the question you started with.

Now there is a particular kind of hypothesis test related to questions about whether a population mean is "within acceptable bounds" of a specified amount (which sounds like what you need for this situation). That's called equivalence testing. What your own framework leaves out (but equivalence testing does not) is that to be reasonably confident the population mean lies inside the acceptable range you will need to sample mean to be well inside it (since the sample mean can diverge from the population mean).

Statistical ideas can help you figure out how far inside you would need to be for that.

Similar Posts:

Rate this post

Leave a Comment