Solved – Should train and test datasets have similar variance

If variance of test dataset is lower than the one of the train dataset is it worth splitting the data? Since we know our dataset will always be limited is it fair to select models under the above condition? Thanks

You have to first figure out why you are splitting the data. The only reason that comes immediately to mind is that fitting the model is so laborious that you can only do it once. Otherwise, resampling methods are far better, starting with the Efron-Gong optimism bootstrap (see e.g. the R rms package) or 10-fold cross-validation repeated 100 times.

Similar Posts:

Rate this post

Leave a Comment