I was researching k-fold cross-validation, and read that one should train on k-1 of the k partitions and test on the remaining partition, and then repeat for each partition, averaging the results to get an estimate of model performance. This I understand; however, if there is no validation dataset, when should I stop the training (since early stopping is not possible, and so the model cannot simply be trained until generalisation performance starts to worsen)? I.e. should training be stopped after a set number of epochs, or when the gradient falls below a certain limit? Are there any tips for what these stopping parameters should be?
If there is no validation set, make one: from the training fold keep a few samples out and use them for early stopping.
Other options are:
- Train until training error converges. If you have enough data and the model is regularized, you can avoid overfitting and this becomes a reliable measure.
- Look for "Optimized Approximation Algorithm" paper; they describe a method for monitoring test performance by analyzing signal-to-noise ratio of the training error. I don't have a practical experience with the method though, so unfortunately I can't tell you how efficient it is.
- Solved – Early stopping vs cross validation
- Solved – Early stopping for CNN to improve speed of training
- Solved – Interplay between early stopping and cross validation
- Solved – Early stopping together with hyperparameter tuning in neural networks
- Solved – how to choose model when training accuracy is lower than validation accuracy while training neural network