I've been working with kernlab
for more than a year now, but I always stickied to the vanilla cost (C-svc
) formulation for classification.
Of course, kernlab
includes some other formulations. In the manual, some classification formulations are briefly cited.
I'm quite familiar with vanilla Cost svms. For example, I know very well the C-svc
and have a vague idea what encompasses a native multi-class formulation and I do understand the principles of nu-svc
.
Could someone please shed some light on these different formulations and how they compare, if applicable, to C-svc
? Information of different considerations made, robustness, sparseness, sensitivity to high dimensionality or class imbalances would be greatly useful.
C-svc
: C (Cost) classificationnu-svc
: $nu$ (nSV) classificationC-bsvc
: bound-constraint SVM classificationspoc-svc
Crammer-Singer native multi-classkbb-svc
Weston-Watkins native multi-class
Best Answer
C-SVM:
$$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} $$
C-svc
is the common SVM. The dual is:
$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha-mathbf1^T alpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {Cover m}mathbf1 end{matrix}right. end{align} $$
With $hat Q_{ij} equiv y_iy_j Phi(x_i)^TPhi(x_j) $.
C-BSVM:
$$ begin{align} &text{minimize} &t(mathbf w,b,xi)= {1over2}|mathbf w|_2^2+{1over2}b^2+{Cover m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)ge1-xi_i &(forall i=1,dots,m) \ xi_i ge0 end{matrix}right. end{align} $$
C-bsvc
introduces the offset $b$ in the objective. This simplifies the dual solution, with only bound constraints remaining.
$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^TQalpha-mathbf1^T alpha\ &text{subject to} &mathbf 0le mathbfalphale {Cover m}mathbf1 end{align} $$
With $Q_{ij} equiv (y_iy_j Phi(x_i)^TPhi(x_j)+1) $
$nu$-SVM:
$$ begin{align} &text{minimize} &t(mathbf w,b,xi,rho)= {1over2}|mathbf w|_2^2-nurho+{1over m}sum_{i = 1}^{m}xi_i \ &text{subject to} &left{begin{matrix} y_i(langlePhi(x_i),mathbf wrangle+b)gerho-xi_i &(forall i=1,dots,m) \ xi_i ge0 & rhoge0 end{matrix}right. end{align} $$
nu-svc
has the following dual:
$$ begin{align} &text{minimize} &t(alpha)= {1over2}alpha^That Qalpha\ &text{subject to} &left{begin{matrix} y^talpha=0 \ mathbf 0le mathbfalphale {1over m}mathbf1 \ |alpha|_1=sum_{i=1}^nalpha_ige nu end{matrix}right. end{align} $$
This means that at least $nu$ elements of the alpha vector will be non-zero. Or, in other words, $nu$ is a lower bound on the fraction of support vectors.
Karatzoglou, Alexandros, David Meyer, and Kurt Hornik. "Support vector machines in R." (2005).
Similar Posts:
- Solved – Mean and covariance of a mixture of two distributions
- Solved – calculation threshold for minimum risk classifier
- Solved – calculation threshold for minimum risk classifier
- Solved – Gaussian process – Why adding data points cannot increase the predictive bias
- Solved – Variance of $hat{mathbf{beta}}_j$ in multiple linear regression models