I have a data on methylated gene expression which follows a bimodal distribution. Can I use a multivariate outlier analysis to detect differentially methylated genes? What are the various methods of doing this and what assumptions come with them? I know similar analyses have been used in analyzing gene expression data, but they claim the data are assumed to follow a multivariate normal distribution.
No, you ought not assume that.
Finding outliers in any data set is tricky; assumptions are dangerous. Even if the data ought to come from a particular distribution, outliers change the parameters of that distribution.
If you know the form your data ought to take (that is, not just that it is multivariate and bimoodal, but the parameters associated with the distribution) you could simulate the data and see how often you get values as extreme as the ones you think are outliers.
But I think maybe you don't need outlier detection so much as some form of regression.
- Solved – Outlier removal for univariate and multivariate analysis
- Solved – Is it reasonable to delete a large number of outliers from a dataset
- Solved – How to do multivariate outlier detection in mixed data with category
- Solved – Identifying subsets for outlier detection in local outlier factor
- Solved – What comes first: outlier detection or model selection