Solved – Training – Validation – Testing Usage

Just a note concerning the splitting and using of the dataset. So if I understand this correctly, we use the validation set to confirm that parameter changes made are giving us positive results. And the testing set is put aside until the very end after we have optimized the model.

So if I had a default SVM, I would use the validation set after say changing the kernel or the gamma to see the performance difference. Then once I have my final optimized model I would use the test set to see how well it generalizes to unseen data?

In simple words, yes. What you have written makes perfect sense. It is advised to split your data into 3 parts :

  1. Training set– This is the actual set where you train your model to get parameters.
  2. Cross Validation set– Cross validation sets are used to indicate that your model has actually learned something useful from the training data.
  3. Test set– It is used to calculate the accuracy of your model.

Mostly, you get only the training data. So you can split the dataset into training, cross-validation, and test set. The ratio to which is

train:cross-validate:test=60:20:20

This is what I usually choose. This is advisable to do because if you only use one set for training and testing, there's uncertainty about how the model is going to perform in the real world with unseen data.

Similar Posts:

Rate this post

Leave a Comment