I'm comparing 2 data sets from nearby locations to see if they differ significantly in air quality. I cannot perform any paired tests because the data at one location does not necessarily correspond to the data at the other location in time (i.e. one sample at location A was taken from Monday to Friday, another sample at location B was taken at from Tuesday to Saturday). In addition, I have some duplicate samples (2 samples taken over the same few days at the same location, but separate physical samples, not duplicate analyses of one sample) I have been advised to compare the CDF of the two locations. Is this good advice? If so, I can use the k-s test here. Are there any other tests I could use?

**Contents**hide

#### Best Answer

Bill Huber is an expert with spatial data and I think has given you good advice. Comparing CDFs may be too simplistic with spatial and temporal effects present and possibly different at the two locations. But there is also a certain amount of aggregation.

Having worked in industry for many years i know that if the bosses want things a certain way sometimes you have no choice but to give it to them that way. Just be careful to provide all the important caveats so that they don't misinterpret the results. Now your basic questions can be answered without getting into the nitty gritty details of the data.

If you have two data set there are a number of tests called empirical cdf tests because they compare the two sample cdfs and look for specific differences. The Kolmogorov-Smirnov test is perhaps the most well-known. It looks at the maximum absolute difference between the two cdfs over the entire range of the data. You can also create histograms of the data constructing the same bins for both data sets. There is a form of the chi-square test that can be used to see in the frequencies in the bins for one group is similar the the frequencies in the other. This can be done using contingency tables. For the contingency table approach there are exact permutation tests (e.g. Fisher's exact test) that also can be used.

### Similar Posts:

- Solved – How to calculate 95% confidence interval of two group means and perform hypothesis test comparing group means
- Solved – Taking the average p value from a set of simulated p values
- Solved – Taking the average p value from a set of simulated p values
- Solved – Running several one-way ANOVA tests on different groups of the same data without inflating type I error
- Solved – Running several one-way ANOVA tests on different groups of the same data without inflating type I error