I compared two sets of data using KS-test. First set is empirical data X1 and the second is expected data X2 which is randomly sampled, normally distributed with mean $mu$ and std dev $sigma$.
The length of X2 is $10^6$. In the plot of their CDF, it looks like that both of them have similar distribution. When I did KS-test with X1 length is $10^3$, the result is H=0, which was correct.
However, when I tried KS-test with X1 length > $10^3$, I got wrong results (H=1), even though the plot showed that they belong to the similar distribution. I attached the plot here, when I got the wrong result. For the plot, I used X1 size=$10^6$ and X2 size=$10^6$. Empirical CDF in red, Expected CDF in blue. I used standard Matlab command (kstest2)
Is there any opinion related to this issue?
Best Answer
When I did KS-test with X1 length is 103103, the result is H=0, which was correct.
This is incorrect. You have failed to reject your null hypothesis, but that doesn't mean the correct result is H=0. All you can say, at that particular sample size the test is not powerful enough to reject the null hypothesis and conclude your empirical CDFs are statistically different.
However, when I tried KS-test with X1 length > 103103, I got wrong results (H=1)
The results is correct. With more samples, the KS test correctly reject your null hypothesis. It should be rejected because they are similar but not identical.
Practically, I wouldn't even both to run KS test here. It's clear the two distributions are very close, why bother? Statistics is not magic, it can't tell you anything not in your data.