Is there a way to test if data are (or at least seem) randomly sampled?
In other words, is there a way to measure if my data are randomly sampled — instead of coming from a complex survey sampling for example — to a statistically significant level?
I imagine something like comparing means over repeated sub-sampling.
Or is this impossible? If so, why?
Best Answer
The process of taking a simple random sample means that every possible sample has an equal probability of being the sample taken. This means that any sample that could have come from a more complex sampling scheme (stratified, cluster, etc.) could also have come from a simple random sample. So there is no definitive way to prove one way or another.
However, you could come up with a prior on how likely different types of sampling are, then do a Bayesian analysis to find the posterior probability of a simple random sample vs. the other types.