Overfitting is when we have a model which has memorized the training data and does not perform well in real-world cases.
Okay, say that I had some training points which look like this:
What if the red curve was the actual 'real-world' relationship. And I found this exact model through a learning algorithm based on those training observations.
Has my model overfit by definition despite it being the 'real-world' relationship? I am assuming yes, but I just want to make sure.
Thanks
Best Answer
I think it may be useful to rephrase the definition of overfitting to something like:
A model that does not generalize well to real-world cases although it fits the training data well.
As for your example:
- If the real world looks like the red line there is by definition no overfitting.
But at the same time, if the black dots are all real-world test data you have, you probably still cannot prove this: in real-world situations, 10 cases are just not enough to prove that a function of the shown complexity was successfully fit.
To give you an idea about one real-world field: in analytical chemistry, a series of 10 concentration steps covering your desired range of analyte concentrations is usually required to show that your method yields linear response.