Solved – the statistical model behind the SVM algorithm

I have learned that, when dealing with data using model-based approach, the first step is modeling data procedure as a statistical model. Then the next step is developing efficient/fast inference/learning algorithm based on this statistical model. So I want to ask which statistical model is behind the support vector machine (SVM) algorithm?

You can often write a model that corresponds to a loss function (here I'm going to talk about SVM regression rather than SVM-classification; it's particularly simple)

For example, in a linear model, if your loss function is $sum_i g(varepsilon_i) = sum_i g(y_i-x_i'beta)$ then minimizing that will correspond to maximum likelihood for $fpropto exp(-a,g(varepsilon))$ $= exp(-a,g(y-x'beta))$. (Here I have a linear kernel)

If I recall correctly SVM-regression has a loss function like this:

plot of epsilon-insensitive loss

That corresponds to a density that is uniform in the middle with exponential tails (as we see by exponentiating its negative, or some multiple of its negative).

plot of corresponding density

There's a 3 parameter family of these: corner-location (relative insensitivity threshold) plus location and scale.

It's an interesting density; if I recall rightly from looking at that particular distribution a few decades ago, a good estimator for location for it is the average of two symmetrically-placed quantiles corresponding to where the corners are (e.g. midhinge would give a good approximation to MLE for one particular choice of the constant in the SVM loss); a similar estimator for the scale parameter would be based on their difference, while the third parameter corresponds basically to working out which percentile the corners are at (this might be chosen rather than estimated as it often is for SVM).

So at least for SVM regression it seems pretty straightforward, at least if we're choosing to get our estimators by maximum likelihood.

(In case you're about to ask … I have no reference for this particular connection to SVM: I just worked that out now. It's so simple, however, that dozens of people will have worked it out before me so no doubt there are references for it — I've just never seen any.)

Similar Posts:

Rate this post

Leave a Comment

Solved – the statistical model behind the SVM algorithm

I have learned that, when dealing with data using model-based approach, the first step is modeling data procedure as a statistical model. Then the next step is developing efficient/fast inference/learning algorithm based on this statistical model. So I want to ask which statistical model is behind the support vector machine (SVM) algorithm?

Best Answer

You can often write a model that corresponds to a loss function (here I'm going to talk about SVM regression rather than SVM-classification; it's particularly simple)

For example, in a linear model, if your loss function is $sum_i g(varepsilon_i) = sum_i g(y_i-x_i'beta)$ then minimizing that will correspond to maximum likelihood for $fpropto exp(-a,g(varepsilon))$ $= exp(-a,g(y-x'beta))$. (Here I have a linear kernel)

If I recall correctly SVM-regression has a loss function like this:

plot of epsilon-insensitive loss

That corresponds to a density that is uniform in the middle with exponential tails (as we see by exponentiating its negative, or some multiple of its negative).

plot of corresponding density

There's a 3 parameter family of these: corner-location (relative insensitivity threshold) plus location and scale.

It's an interesting density; if I recall rightly from looking at that particular distribution a few decades ago, a good estimator for location for it is the average of two symmetrically-placed quantiles corresponding to where the corners are (e.g. midhinge would give a good approximation to MLE for one particular choice of the constant in the SVM loss); a similar estimator for the scale parameter would be based on their difference, while the third parameter corresponds basically to working out which percentile the corners are at (this might be chosen rather than estimated as it often is for SVM).

So at least for SVM regression it seems pretty straightforward, at least if we're choosing to get our estimators by maximum likelihood.

(In case you're about to ask … I have no reference for this particular connection to SVM: I just worked that out now. It's so simple, however, that dozens of people will have worked it out before me so no doubt there are references for it — I've just never seen any.)

Similar Posts:

Rate this post

Leave a Comment