I have two distributions of continuous, unpaired measurements. I would like to visualize the two distributions with a pair of histograms, counting measurements that fall over a bin's interval.
Are there ways to rescale or process the smaller of the two sets, so that when I make two histograms (or other visualizations, violin, box, etc.) of their data, the visualization does not lead the viewer to favor a bin interval containing under- or over-represented measurements from one set, relative to the other.
Best Answer
If you really need to compare histograms at different sample sizes, scale them both to area 1 (i.e. to be density estimates).
However, as Nick suggested in comments, there are other ways of comparing the distributions that don't require binning.
You could plot ecdfs, or a pair of theoretical QQ plots on the same axes (the theoretical distribution doesn't need to be perfect, though a reasonable approximation will help with detailed comparisons), or perhaps kernel density estimates, for example.
Similar Posts:
- Solved – Approaches for comparing visual representation of two distributions with unequal sample sizes
- Solved – Approaches for comparing visual representation of two distributions with unequal sample sizes
- Solved – histogram normalised to area 1
- Solved – Best graphical representation of t-test data
- Solved – Best graphical representation of t-test data