I've been working with `kernlab`

for more than a year now, but I always stickied to the vanilla cost (`C-svc`

) formulation for classification.

Of course, `kernlab`

includes some other formulations. In the manual, some classification formulations are briefly cited.

I'm quite familiar with *vanilla* Cost svms. For example, I know very well the `C-svc`

and have a vague idea what encompasses a native multi-class formulation and I do understand the principles of `nu-svc`

.

Could someone please shed some light on these different formulations and how they compare, if applicable, to `C-svc`

? Information of different considerations made, robustness, sparseness, sensitivity to high dimensionality or class imbalances would be greatly useful.

`C-svc`

: C (Cost) classification`nu-svc`

: $nu$ (nSV) classification`C-bsvc`

: bound-constraint SVM classification`spoc-svc`

Crammer-Singer native multi-class`kbb-svc`

Weston-Watkins native multi-class

**Contents**hide

#### Best Answer

C-SVM:

$$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} $$

`C-svc`

is the common SVM. The dual is:

$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha-mathbf1^T alpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {Cover m}mathbf1 end{matrix}right. end{align} $$

With $hat Q_{ij} equiv y_iy_j Phi(x_i)^TPhi(x_j) $.

C-BSVM:

$$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{1over2}b^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} $$

`C-bsvc`

introduces the offset $b$ in the objective. This simplifies the dual solution, with only bound constraints remaining.

$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^TQalpha-mathbf1^T alpha\ &text{subject to} &mathbf 0le mathbfalphale {Cover m}mathbf1 end{align} $$

With $Q_{ij} equiv (y_iy_j Phi(x_i)^TPhi(x_j)+1) $

$nu$-SVM:

$$ begin{align} &text{minimize} &t(mathbf w,b,xi,rho)= {1over2}|mathbf w|_2^2-nurho+{1over m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)gerho-xi_i &(forall i=1,dots,m) \ xi_i ge0 & rhoge0 end{matrix}right. end{align} $$

`nu-svc`

has the following dual:

$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {1over m}mathbf1 \ |alpha|_1=sum_{i=1}^nalpha_ige nu end{matrix}right. end{align} $$

This means that at least $nu$ elements of the alpha vector will be non-zero. Or, in other words, $nu$ is a lower bound on the fraction of support vectors.

_{Karatzoglou, Alexandros, David Meyer, and Kurt Hornik. "Support vector machines in R." (2005).}

_{Niu, Lingfeng, et al. "Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines." Foundations of Computing and Decision Sciences 40.1 (2015): 67-86.}

### Similar Posts:

- Solved – Mean and covariance of a mixture of two distributions
- Solved – calculation threshold for minimum risk classifier
- Solved – calculation threshold for minimum risk classifier
- Solved – Gaussian process – Why adding data points cannot increase the predictive bias
- Solved – Variance of $hat{mathbf{beta}}_j$ in multiple linear regression models