Solved – Which method to use to compare two sets of data for similarity?

I need to validate that my simulation results are correct. So I have two sets of data, my simulation results, and calculations I obtained from an established commercial software to compare against. But I'm not sure what is best way to do it.

I don't know if this is relevant, but the datasets are large (> 20k points). The scale also varies a lot. For example:

``X1       X2 1205.3   1206.7 1245.7   1242.1 0.53     0.21 428.1    428.3 ``

I think absolute difference is more important for me than relative one, since for very low numbers the relative difference can be huge, even though in reality a difference between 0.53 and 0.21 isn't relevant at all in this simulation.

What I can think of is to use the mean of absolute difference X2 – X1. Although I'm not sure this is very scientific. Maybe I can show a histogram of the differences instead?

Contents

Best Answer

Kolmogorov-Smirnov test can be used to see if the two distributions are different. If the points are paired (i.e. for every simulation point there is a corresponding observed empirical value) you could do a paired t-test. I'm not sure what the effects of so many data points would be on eroniosly accepting the null hypothesis (i.e., means are not statistically different). However, you are not likely to falsely reject the null hypothesis with such a large data set. Caveat: these are technically not validating whether or not you are correct just assisting in determining if the 2 data sets are statistically different. No model is correct there is just the best fitting one. If you test for statistical differences and there are none, you are likely okay.

Rate this post