I have learned that, when dealing with data using model-based approach, the first step is modeling data procedure as a statistical model. Then the next step is developing efficient/fast inference/learning algorithm based on this statistical model. So I want to ask which statistical model is behind the support vector machine (SVM) algorithm?
Best Answer
You can often write a model that corresponds to a loss function (here I'm going to talk about SVM regression rather than SVM-classification; it's particularly simple)
For example, in a linear model, if your loss function is $sum_i g(varepsilon_i) = sum_i g(y_i-x_i'beta)$ then minimizing that will correspond to maximum likelihood for $fpropto exp(-a,g(varepsilon))$ $= exp(-a,g(y-x'beta))$. (Here I have a linear kernel)
If I recall correctly SVM-regression has a loss function like this:
That corresponds to a density that is uniform in the middle with exponential tails (as we see by exponentiating its negative, or some multiple of its negative).
There's a 3 parameter family of these: corner-location (relative insensitivity threshold) plus location and scale.
It's an interesting density; if I recall rightly from looking at that particular distribution a few decades ago, a good estimator for location for it is the average of two symmetrically-placed quantiles corresponding to where the corners are (e.g. midhinge would give a good approximation to MLE for one particular choice of the constant in the SVM loss); a similar estimator for the scale parameter would be based on their difference, while the third parameter corresponds basically to working out which percentile the corners are at (this might be chosen rather than estimated as it often is for SVM).
So at least for SVM regression it seems pretty straightforward, at least if we're choosing to get our estimators by maximum likelihood.
(In case you're about to ask … I have no reference for this particular connection to SVM: I just worked that out now. It's so simple, however, that dozens of people will have worked it out before me so no doubt there are references for it — I've just never seen any.)
Similar Posts:
- Solved – the statistical model behind the SVM algorithm
- Solved – the statistical model behind the SVM algorithm
- Solved – Best Validation accuracy occurs early on in the training process
- Solved – why the accuracy of the CNN decreasing after some epochs
- Solved – What parameter of GBM does gradient descent update after calculating gradient of loss function