Here is an experiment I did:
- I bootstrapped a sample $S$ and stored the results as empirical distribution under the name $S_1$.
- Then I bootstrapped $i=10000$ times in a row the same sample $S$ and compare the resulting empirical distributions $S_i$ with $S_1$ using Kolmogorov-Smirnov test .
Results from the experiment: The comparisons return different $p$-values (from $0.01$ to $0.99$) and different $D$ values (from $0.02$ to $0.06$).
Is that expected? If I bootstrap the same sample 1000 times isn't it expected that all 1000 empirical distributions to be from the same distribution?
If yes then should I try to establish the distribution of the empirical distributions ($S_1$, $S_i$)?
Three empirical distributions $S_1$, $S_2$, $S_3$ bootstrapped from the same initial sample $S$:
S1: 1,2,3,4,5,6 S2: 1,3,4,5,6,7 S3: 2,4,5,6,7,8
If I add them up I get:
I think I understand your problem now. You alluded to your assumption that somehow KS test should show that all bootstrapped samples should be shown to be from the original sample. However, consider this: what does it mean to show that they're from the same distribution?
It usually means that p-value is over some $alpha$ confidence. If bootstrapping is done properly you'll get p-value sometimes over, sometimes under the $alpha$. Build the distribution of test statistics you get from running KS test on bootstrapped samples. Observe p-values for various critical values, they should match the theoretical values for which KS test was designed.