# Solved – Understanding the different formulations for SVM

I've been working with `kernlab` for more than a year now, but I always stickied to the vanilla cost (`C-svc`) formulation for classification.

Of course, `kernlab` includes some other formulations. In the manual, some classification formulations are briefly cited.
I'm quite familiar with vanilla Cost svms. For example, I know very well the `C-svc` and have a vague idea what encompasses a native multi-class formulation and I do understand the principles of `nu-svc`.

Could someone please shed some light on these different formulations and how they compare, if applicable, to `C-svc`? Information of different considerations made, robustness, sparseness, sensitivity to high dimensionality or class imbalances would be greatly useful.

• `C-svc`: C (Cost) classification
• `nu-svc`: \$nu\$ (nSV) classification
• `C-bsvc`: bound-constraint SVM classification
• `spoc-svc` Crammer-Singer native multi-class
• `kbb-svc` Weston-Watkins native multi-class
Contents

C-SVM:

\$\$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} \$\$

`C-svc` is the common SVM. The dual is:

\$\$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha-mathbf1^T alpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {Cover m}mathbf1 end{matrix}right. end{align} \$\$

With \$hat Q_{ij} equiv y_iy_j Phi(x_i)^TPhi(x_j) \$.

C-BSVM:

\$\$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{1over2}b^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} \$\$

`C-bsvc` introduces the offset \$b\$ in the objective. This simplifies the dual solution, with only bound constraints remaining.

\$\$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^TQalpha-mathbf1^T alpha\ &text{subject to} &mathbf 0le mathbfalphale {Cover m}mathbf1 end{align} \$\$

With \$Q_{ij} equiv (y_iy_j Phi(x_i)^TPhi(x_j)+1) \$

\$nu\$-SVM:

\$\$ begin{align} &text{minimize} &t(mathbf w,b,xi,rho)= {1over2}|mathbf w|_2^2-nurho+{1over m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)gerho-xi_i &(forall i=1,dots,m) \ xi_i ge0 & rhoge0 end{matrix}right. end{align} \$\$

`nu-svc` has the following dual:

\$\$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {1over m}mathbf1 \ |alpha|_1=sum_{i=1}^nalpha_ige nu end{matrix}right. end{align} \$\$

This means that at least \$nu\$ elements of the alpha vector will be non-zero. Or, in other words, \$nu\$ is a lower bound on the fraction of support vectors.

Karatzoglou, Alexandros, David Meyer, and Kurt Hornik. "Support vector machines in R." (2005).

Niu, Lingfeng, et al. "Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines." Foundations of Computing and Decision Sciences 40.1 (2015): 67-86.

Rate this post