Mahalanobis distance, when used for classification purposes, typically assumes a multivariate normal distribution, and the distances from the centroid should then follow a $chi^2$ distribution (with $d$ degrees of freedom equal to the number of dimensions/features). We can calculate the probability that a new data point belongs to the set using its Mahalanobis distance.
I have data sets that do not follow a multivariate normal distribution ($d approx 1000$). In theory, each feature should follow a Poisson distribution, and empirically this seems to be the case for many ($approx 200$) features, and those that do not are in the noise and can be removed from the analysis. How can I classify new points on this data?
I guess there are two components:
- What is an appropriate "Mahalanobis distance" formula on this data (i.e. multivariate Poisson distribution)? Is there a generalization of the distance to other distributions?
- Whether I use the normal Mahalanobis distance or another formulation, what should the distribution of these distances be? Is there a different way to do the hypothesis test?
The number of known data points $n$ in each class varies widely, from $n=1$ (too few; I'll determine a minimum empirically) to around $n=6000$. The Mahalanobis distance scales with $n$, so distances from one model/class to the next cannot be directly compared. When the data is distributed normally, the chi-squared test provides a way to compare distances from different models (in addition to providing critical values or probabilities). If there is another way to directly compare the "Mahalanobis-like" distances, even if it does not provide probabilities, I could work with that.
You might want to check out Karlis and Meligkotsidou, "Multivariate poisson regression with covariance structure". 2005. This paper is about the authors' attempts to model multivariate Poisson variables, which they acknowledge to be a difficult task.
Use of the Mahalanobis' distance implies that inference can be done through the mean and covariance matrix – and that is a property of the normal distribution alone. If you use the MD on your data, you are basically pretending that they are Normal.