Solved – Calculate p-value of multivariate normal distribution

I want to calculate $p$-values by using the statistics program R.

I want to test multiple groups to a placebo group and thus I want to test the nullhypothesis $H^{11}_0 : mu_1 = mu_2, ldots H^{1m}_0 : mu_1 = mu_m$. My test statistics $T^1,ldots,T^m$, where $T^1$ is the test statistic to test $H^{11}$ and so on, are asymptotically normal distributed and my vector of test statistics $(T^1,ldots,T^m)$ has multivariate normal distribution with mean $mu=(0,…,0)$ and a Covariance-matrix Cov under the nullhypothesis.
Let's say $m=3$ and $mu=(0,0,0)$ and
Cov =
1 & 0.5 & 0.1
0.5 & 1 & 0.1
0.1 & 0.1 & 1

Now I've for example observed $T_1 = 0.2, T_2=1.3, T_4=-0.4$.
I've thought about something like

$$p_1 = 1-pmvnorm(lower=(-T_1,-Inf,-Inf),

But I'd get the same $p_1$ for any Covariance-Matrix with diagonal elements equal to 1 and this obviously seems kinda false since the correlation between the test statistics aren't considered. But I actually don't know how else to calculate the $p$-value. So any advice would be helpful.

Thank you!

I'm not very good at drawing in 3 dimensions, so here is a 2D view of what you're calculating with that definition of $p_1$:

enter image description here

This is a rectangular region going to infinity and just touching the observed point $(T_1, T_2, T_3)$ at the corner. While that is certainly a value that can be calculated, it is not particularly meaningful.

It is much more common to construct a hypothesis test by calculating something like this:

enter image description here

Where the measure of the orange region is now the probability that a random point would have been "less likely" (e.g. have a lower probability density) than $(T_1, T_2, T_3)$. This can also be interpreted as the probability that a random point would be further from the origin in the the Mahalanobis metric.

The formal test statistic is then:

$$ d = sqrt{({mathbf x}-{boldsymbolmu})^mathrm{T}{boldsymbolSigma}^{-1}({mathbf x}-{boldsymbolmu})} $$

Where $Sigma$ is the covariance matrix, $Sigma^{-1}$ is the precision matrix, and $mathbf{x}$ is the vector $(T_1, T_2, T_3)$ in your notation. If true population parameters $mu$ and $Sigma$ are known then $d^2 sim chi^2_1$ (read as $d^2$ has the Chi-square distribution with one degree of freedom). If, on the other hand, $mu$ and $Sigma$ are empirical estimates from the same sample, then $d^2$ has the Hotelling's T-squared distribution.

Of course, only you can know exactly what hypothesis you want to test. I'm just showing you one common way that other people have approached this problem, but I can't know your specific situation in detail; I'm just guessing. Think carefully about which definition is most useful for what you are trying to accomplish!

Similar Posts:

Rate this post

Leave a Comment