I'm using Eureqa, as machine learning tool to fit a formula to my data. I found out that the formula fits my test data better than my training data! Is this abnormal?

**Contents**hide

#### Best Answer

It is unusual but could happen. First of all, you have to realize that the model usually is only approximately able to mimic the data. How well the model fits the data can vary for different domains for the values of the predictor variables. For instance, suppose that variable $y$ is explainable by a linear function of $x$ plus noise. Suppose $y$ values were measured for $0 < x < 20$ and these data is divided into the training set with $M_text{train}={(x,y):x<10}$ and test data $M_text{test}={(x,y):xgeq10}$. If the noise is bigger for the training set ($x<10$) you can still get an accurate estimation of the slope parameter for the whole domain of $x$. Since the model is good it also predicts the $y$ values in the test set well and furthermore, because the noise term is smaller for the test set, also the prediction error (out-of-sample error) for the test set is smaller. Summarized:

If the model is a good description of reality and the noise (e.g. measurement errors) in the training set is larger than in the test set, the out-of-sample error (prediction error for the test set) can be smaller than the in-sample error (description error for the training set).

Whether this behaviour emerges also depends in the way the test set and the training set is constructed. If both sets are constructed by a random process picking elements from the same population and if this procedure is performed several times to estimate the out-of-sample error, then the out-of-sample error should not be smaller than the in-sample error. Alternatively, if the training set was obtained by one inaccurate measurement and the test set by a more accurate one, the error for the test data could be smaller than for the training set, of course under the assumption of a good model.

### Similar Posts:

- Solved – Is it abnormal that out-of-sample fit is better than in-sample
- Solved – Is it abnormal that out-of-sample fit is better than in-sample
- Solved – Why is cross validation error high upon overfitting
- Solved – Fit exponential distribution with noise
- Solved – How to know if model is overfitting or underfitting