When conducting a grid search over a range of parameters of a predictive model which is itself subject to randomness (such as a random forest with bagged features), should you set a seed for the predictive model, so that for each round of the grid search, the model is initialized the same? It seems straightforward, that you ONLY want to test the parameters, and the less variance, the better, but is there any scientific consensus on this?
Best Answer
The premise that the same random seed will lead two randomized algorithms to have more similar performance is extremely dubious (except perhaps for the most similar and specially structured of algorithms over the smallest of samples).
An analogy
Using a Monte-Carlo simulation, let's say you're trying to estimate a casino's house take in:
- Game A: blackjack where the dealer hits on soft 17
- Game B: blackjack where the dealer stands on soft 17
Would it make the comparison less noisy if Game A and Game B used the same order of cards (i.e. started with the same random seed)?
No! (Not in any meaningful way.) The moment Game A leads the dealer to take an additional card (compared to Game B), the games are no longer in sync: players will be dealt different hands, cards that would have gone to the dealer instead go to a player etc…. Just one card offset makes a huge difference, and everything will just diverge from there.
There may be some special case algorithms where the small differences don't just compound, but I would think these are unusual cases.
Similar Posts:
- Solved – the difference between manual search and grid search for hyperparameters
- Solved – Randomized search on big dataset
- Solved – How to select hyperparameters for SVM regression after grid search
- Solved – Independent variable correlated with dependent variable
- Solved – How to analyze plot from libsvm grid-search