For which kinds of supervised machine learning techniques is it possible to estimate how uncertain the model is about its predictions for given level/range of the predictor once the model is trained on a set of data?
I can imagine that it is possible to do that e.g. for random forest by looking at the variance of votes that the forest is giving for a data point in the evaluation dataset. On the other hand it seems for me impossible to estimate model uncertainty for linear regression and similar methods.
Could anyone explain for which machine learning techniques this can be done or point me to the relevant literature?
Contents
hide
Best Answer
Many models can actually provide you with the uncertainty measure, first of all:
- Naive Bayes directly models the
P(y|x)
probability, which is exactly what you are asking for - Support Vector Machine defines a hyperplane, a distance to this hyperplane is a certainty measure (closer the point, less certain is the model). In libraries like python
sklearn
you can access it by looking for the difference betweendecision_function(x)
value and theintercept
parameter - Multilayer Neural Network if you train a network for the M-classes classification task with the M-output neurons network (and the expected output for the element of first class is
1 0 0 ...
, for the second0 1 0 ...
and so on), the output neurons' values can be used as a certainty measure (0 0.7 0 ...
can be interpreted as "quite a member of second class), but of course some more interesting measures can be used here (like e.g. Kullback–Leibler divergence)