I was reading andrew ng's machine learning lecture notes on SVM. I came across the following equation (finding the optimal value for the intercept term $b$ in the SVM problem):

However, I have no idea how the intercept term $b$ is derived by solving the primal problem ?

I believe the primal's Lagrangian is:

$$min_{w,b} max_{alpha} mathcal{L}(w,b,alpha) = min_{w,b} max_{alpha} frac{1}{2} ||w||^2 – sum_{i=1}^m alpha_i [y_i (w^T x_i + b) – 1]$$

But how do I solve for $b$ ? Any help will be great. Thank you very much.

**Contents**hide

#### Best Answer

I have geometric explanation. Think of SVM as a maximum margin classifier. In that sense we seek separating hyperplane which will be equidistant from all negative and all positive examples. This includes that the distance from hyperplane from the closest to it's negative example would be as large as the distance to the closest positive. Let $w^*$ be known, then $$max_{i: y^{(i)}=-1} w^{*T}x^{(i)}$$ is the closest (worst case) distance from all possible negative examples. Similarly $$min_{i: y^{(i)}=1} w^{*T}x^{(i)}$$ is the closest (worst case) distance from all possible positive examples. How can we choose intercept so that the worst case distance for all (worst case) examples is maximum? Yes, we take the average of two.

The '-' sign.

Strictly speaking, $max_{i: y^{(i)}=-1} w^{*T}x^{(i)}$ is not a distance because it is negative, while $min_{i: y^{(i)}=1} w^{*T}x^{(i)}>0$. So in order to bring hyperplane from the worst negative to the worst positive direction we need the '-' sign.