I want to calculate $p$-values by using the statistics program R.
I want to test multiple groups to a placebo group and thus I want to test the nullhypothesis $H^{11}_0 : mu_1 = mu_2, ldots H^{1m}_0 : mu_1 = mu_m$. My test statistics $T^1,ldots,T^m$, where $T^1$ is the test statistic to test $H^{11}$ and so on, are asymptotically normal distributed and my vector of test statistics $(T^1,ldots,T^m)$ has multivariate normal distribution with mean $mu=(0,…,0)$ and a Covariance-matrix Cov under the nullhypothesis.
Let's say $m=3$ and $mu=(0,0,0)$ and
begin{align}
Cov =
begin{pmatrix}
1 & 0.5 & 0.1
\
0.5 & 1 & 0.1
\
0.1 & 0.1 & 1
end{pmatrix}.
end{align}
Now I've for example observed $T_1 = 0.2, T_2=1.3, T_4=-0.4$.
I've thought about something like
$$p_1 = 1-pmvnorm(lower=(-T_1,-Inf,-Inf),
upper=(T_1,Inf,Inf),mean=mu,sigma=Cov).$$
But I'd get the same $p_1$ for any Covariance-Matrix with diagonal elements equal to 1 and this obviously seems kinda false since the correlation between the test statistics aren't considered. But I actually don't know how else to calculate the $p$-value. So any advice would be helpful.
Thank you!
Best Answer
I'm not very good at drawing in 3 dimensions, so here is a 2D view of what you're calculating with that definition of $p_1$:
This is a rectangular region going to infinity and just touching the observed point $(T_1, T_2, T_3)$ at the corner. While that is certainly a value that can be calculated, it is not particularly meaningful.
It is much more common to construct a hypothesis test by calculating something like this:
Where the measure of the orange region is now the probability that a random point would have been "less likely" (e.g. have a lower probability density) than $(T_1, T_2, T_3)$. This can also be interpreted as the probability that a random point would be further from the origin in the the Mahalanobis metric.
The formal test statistic is then:
$$ d = sqrt{({mathbf x}-{boldsymbolmu})^mathrm{T}{boldsymbolSigma}^{-1}({mathbf x}-{boldsymbolmu})} $$
Where $Sigma$ is the covariance matrix, $Sigma^{-1}$ is the precision matrix, and $mathbf{x}$ is the vector $(T_1, T_2, T_3)$ in your notation. If true population parameters $mu$ and $Sigma$ are known then $d^2 sim chi^2_1$ (read as $d^2$ has the Chi-square distribution with one degree of freedom). If, on the other hand, $mu$ and $Sigma$ are empirical estimates from the same sample, then $d^2$ has the Hotelling's T-squared distribution.
Of course, only you can know exactly what hypothesis you want to test. I'm just showing you one common way that other people have approached this problem, but I can't know your specific situation in detail; I'm just guessing. Think carefully about which definition is most useful for what you are trying to accomplish!
Similar Posts:
- Solved – a pivotal statistic
- Solved – Sufficient statistic for bivariate or multivariate normal
- Solved – r – pick 10 random numbers from standard normal distribution whose sum equals 5
- Solved – Covariance matrix for Gaussian Process and Wishart distribution
- Solved – Variance of $hat{mathbf{beta}}_j$ in multiple linear regression models