Solved – Understanding the different formulations for SVM

I've been working with kernlab for more than a year now, but I always stickied to the vanilla cost (C-svc) formulation for classification.

Of course, kernlab includes some other formulations. In the manual, some classification formulations are briefly cited.
I'm quite familiar with vanilla Cost svms. For example, I know very well the C-svc and have a vague idea what encompasses a native multi-class formulation and I do understand the principles of nu-svc.

Could someone please shed some light on these different formulations and how they compare, if applicable, to C-svc? Information of different considerations made, robustness, sparseness, sensitivity to high dimensionality or class imbalances would be greatly useful.

  • C-svc: C (Cost) classification
  • nu-svc: $nu$ (nSV) classification
  • C-bsvc: bound-constraint SVM classification
  • spoc-svc Crammer-Singer native multi-class
  • kbb-svc Weston-Watkins native multi-class

C-SVM:

$$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} $$

C-svc is the common SVM. The dual is:

$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha-mathbf1^T alpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {Cover m}mathbf1 end{matrix}right. end{align} $$

With $hat Q_{ij} equiv y_iy_j Phi(x_i)^TPhi(x_j) $.


C-BSVM:

$$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{1over2}b^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} $$

C-bsvc introduces the offset $b$ in the objective. This simplifies the dual solution, with only bound constraints remaining.

$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^TQalpha-mathbf1^T alpha\ &text{subject to} &mathbf 0le mathbfalphale {Cover m}mathbf1 end{align} $$

With $Q_{ij} equiv (y_iy_j Phi(x_i)^TPhi(x_j)+1) $


$nu$-SVM:

$$ begin{align} &text{minimize} &t(mathbf w,b,xi,rho)= {1over2}|mathbf w|_2^2-nurho+{1over m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)gerho-xi_i &(forall i=1,dots,m) \ xi_i ge0 & rhoge0 end{matrix}right. end{align} $$

nu-svc has the following dual:

$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {1over m}mathbf1 \ |alpha|_1=sum_{i=1}^nalpha_ige nu end{matrix}right. end{align} $$

This means that at least $nu$ elements of the alpha vector will be non-zero. Or, in other words, $nu$ is a lower bound on the fraction of support vectors.


Karatzoglou, Alexandros, David Meyer, and Kurt Hornik. "Support vector machines in R." (2005).

Niu, Lingfeng, et al. "Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines." Foundations of Computing and Decision Sciences 40.1 (2015): 67-86.

Similar Posts:

Rate this post

Leave a Comment