Solved – ” all of these data points come from the same distribution.” How to test

I feel like I've seen this topic discussed here before, but I wasn't able to find anything specific. Then again, I'm also not really sure what to search for.

I have a one dimensional set of ordered data. I hypothesize that all of the points in the set are drawn from the same distribution.

How can I test this hypothesis? Is it reasonable to test against a general alternative of "the observations in this data set are drawn from two different distributions"?

Ideally, I would like to identify which points come from the "other" distribution. Since my data is ordered, could I get away with identifying a cut point, after somehow testing whether it's "valid" to cut the data?

Edit: as per Glen_b's answer, I would be interested in strictly positive, unimodal distributions. I'd also be interested in the special case of assuming a distribution and then testing for different parameters.

Imagine two scenarios:

  1. the data points were all drawn from the same distribution — one that was uniform on (16,36)

  2. the data points were drawn from a 50-50 mix of two populations:

    a. population A, which is shaped like this:

enter image description here

b. population B, shaped like this:

enter image description here

… such that the mixture of the two looks exactly like the case in 1.

How could they be told apart?

Whatever shapes you choose for two populations, there's always going to be a single population distribution that has the same shape. This argument clearly demonstrates that for the general case you simply can't do it. There's no possible way to differentiate.

If you introduce information about the populations (assumptions, effectively) then there may often be ways to proceed*, but the general case is dead.

* e.g. if you assume that populations are unimodal and have sufficiently different means you can get somewhere

[There restrictions that were added to the question are not sufficient to avoid a different version of the kind of problem I describe above — we can still write a unimodal null on the positive half-line as a 50-50 mixture of two unimodal distributions on the positive half-line. Of course if you have a more specific null, this becomes much less of an issue. Alternatively it should still be possible to restrict the class of alternatives further until we were in a position to test against some mixture alternative. Or some additional restrictions might be applied to both null and alternative that would make them distinguishable.]

Similar Posts:

Rate this post

Leave a Comment