Solved – Is it valid to apply a t-test on scores obtained by cross validation

I have a 12% holdout cross validation and I've done a bunch of repetitions which
gives me a distribution of scores for each method.

I'd like to do some sort of hypothesis testing for comparing methods. Each score I have is unitary value between 0 and 1. A view at a histogram yields an approximately normal distribution.

Could a t-test be valid in this scenario?

I would say no for several reasons: 1. the individual scores are not independent because of sample reuse 2. the distribution is confined to [0,1] so it is truncated and not normal (could be approximately normal though if truncation is not too great) 3, Saying "some sort of hypothesis test" doesn't tell us what you want to do. How many methods are you comparing ? If it is more than 2 are you comparing them pairwise? If one method has a higher average score what does that tell you? Maybe a nonparametric ANOVA is really what you need.

Similar Posts:

Rate this post

Leave a Comment