(5 points) Assume that we are interested in generating a model (e.g., a decision tree) from a

sample of examples of a specific size drawn from some distribution. Assume further that we

would like to investigate how sensitive the resulting model is to the actual choice of training

examples (i.e., how the performance varies over different sets of training examples of the

specific size). Assume that we have access to 100 training examples drawn from the underlying

distribution. If we are interested in investigating how the performance varies for models

generated from 90 examples, would we obtain a reliable estimate of the variance of the model

performance by performing a 10-fold cross-validation? Motivate your answer.

Is the 10-fold cross validation reliable for estimating the variance of model performance?

**Contents**hide

#### Best Answer

10 fold cross-validation is known to be a good way to get unbiased or nearly unbiased estimates of the error rates for classification / prediction based on a training set of a given size. If that is what you mean then the answer to your first question is yes.

If you mean by variance how the decision trees, which are different because the training samples differ, performance varies from one training sample of size 90 to another I am not sure. But I do think you could assess that by bootstrap.

### Similar Posts:

- Solved – can I increase Model Complexity if I get a larger Training Set
- Solved – Leave one out and stratified 10-fold cross validation
- Solved – Logistic regression performs better on validation data
- Solved – Great examples of instrumental variable estimators
- Solved – When we should NOT use k-fold cross validation to assess the predictor