I actually understood the derivation behind support Vector Machine but I have a doubt about constraint equation.
Why we have a constraint equation $geq1$ if $y_i=1$ and $leq-1$ if $y_i=-1$?
Can we have any arbitrary constant instead of 1? If no, then what is rational behind having this particular value?
Any help is highly appreciated.
Best Answer
Yes, you can have any arbitrary, strictly positive constant instead of 1.
Why? First some background.
Math and separating hyperplane:
Support vector machines attempts to find a separating hyper-plane between sets $X$ and $Y$. Mathematically, the condition for a separating hyperplane is:
$$ boldsymbol{w} cdot boldsymbol{x}_i – b < 0 quad quad boldsymbol{w} cdot boldsymbol{y}_i – b > 0 $$
Observe that the inequalities are strict!
Numerical issues and practical solution:
Numerically, this formulation has practical problems. If the inequalities aren't strict, $boldsymbol{w} = boldsymbol{0}, b = 0$ is a trivial solution. Numerical optimization routines may give bizarre answers to this problem; standard floating point math isn't infinitely precise etc…
What to do? Let's replace the strict inequalities with non-strict inequalities plus some separation constant $t>0$: $$ boldsymbol{w} cdot boldsymbol{x}_i – b leq -t quad quad boldsymbol{w} cdot boldsymbol{y}_i – b geq t $$ Yay! Numerical optimization can handle this. Also observe that since $boldsymbol{w}$ and $b$ are choices variables, the scale of $t$ really doesn't matter. It's totally arbitrary. So we can just make it simple for ourselves and choose 1. (You could even choose different positive values for both inequalities; it doesn't matter.)
$$ boldsymbol{w} cdot boldsymbol{x}_i – b leq -1 quad quad boldsymbol{w} cdot boldsymbol{y}_i – b geq 1 $$
Other interpretation:
As your text explains, another interepretation of this is that you're fitting two parallel hyperplanes, one touching the X set, one touching the Y set, with some distance between them.
Similar Posts:
- Solved – Normalized correlation with a constant vector
- Solved – Understanding a characterization of minimal sufficient statistics
- Solved – Derivation of Restricted Boltzmann Machine Conditional Probability
- Solved – Sufficient statistic for bivariate or multivariate normal
- Solved – How to prove the identifiability of a likelihood