I have been seaching internet for exact definition of cross validation . I have come acrossed a few different ideas, with different terminology. I don't know if I have understand correctly.
Basically, what I understand now is, there are two major applications of cross-validation.
Hyper-parameter tuning. Lasso has a parameter $lambda$ . We don't know which $lambda$ we should use. So we split the into training set and testing set. Try different $lambda$ with these 'sub-problems' and see which $lambda$ gives the best performance.
Model validation. Imaging that I have implemented both Lasso and Gradient Boosted Regression Tree. I want to know which one would work better in real-life ( predicting new , unseen data). So I split the data into training/testing parts. I will choose the one that yields better out-of-sample performance in cross validation.
Is my understanding correct?
Thanks
Best Answer
I'd say mostly the first (i.e. Hyper-parameter tuning).
If you have a sufficiently large hold-out test set you can evaluate the models pretty reliably. When wanting to select hyperparameters, having a validation set could cause your model to overfit on that. CV makes it much harder to do so.
Similar Posts:
- Solved – CV for model parameter tuning AND then model evaluation
- Solved – CV for model parameter tuning AND then model evaluation
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%
- Solved – Cross-validation accuracy interpretation -( accuracy of 100%