The Wikipedia entry on Mahalanobis Distance contains this note:
Another intuitive description of Mahalanobis distance is that it is square root of the negative log likelihood. That is, the exponential of the negative square of the Mahalanobis distance will give you the likelihood of your data point belonging to (a presumed normal) distribution of the sample points you already have.
How/where is this shown? What does it mean (I guess it's not so intuitive…)?
Contents
hide
Best Answer
For completeness, I'll answer myself.
I eventually found the equation here, and as @whuber writes in his comment:
Define:
- $k$: the multivariate dimension
- $mu$: the multivariate mean (a $k$-dimensional vector);
- $Sigma$: the $ktimes k$ covariance matrix;
Then:
- The squared Mahalanobis Distance is: $D^2=(x-mu)^TSigma^{-1}(x-mu)$
- The log-likelihood is: $ln(L)=-frac12ln(|Sigma|)-frac12(x-mu)^TSigma^{-1}(x-mu)-frac k2 ln(2pi)$
or: $ln(L)=-frac12ln(|Sigma|)-frac12D^2-frac k2 ln(2pi)$
Thus, define $cequivfrac12(ln(|Sigma|) + kcdot ln(2pi))$ then:
- $ln(L)=-frac12D^2-c$ and;
- $D=sqrt{-2(ln(L)+c)} $