Solved – logistic regression gradient of weights

I am reading about logistic regression (from and am looking at the negative log likelihood function. They take the gradient with respect to the weights and produce the result at the bottom of page 7. I calculated this myself and can't seem to get the solution that they arrived at.

They set

$$NLL = – sum_{i=1}^N[(1-y)log(1-s(w^Tx_i))+y;log;(s(w^Tx_i))]$$

where $s$ is the sigmoid function $s(x) =frac{1}{e^{-x}+1}$

When I take $frac{partial NNL}{partial w}$, I get
$$ -sum_{i=1}^N ( ;frac{x_i(y_i-1)e^{w^Tx_i}}{e^{W^tx_i}+1} + frac{x_iy_i}{e^{W^tx_i}+1})$$

and not $$ sum_{i=1}^N (s(w^Tx_i)-y)x_i)$$


I must be making a mistake since this is just a simple gradient calculation. Can anyone shed some light onto how this was computed?

It is a simple calculation but one can easily make a mistake. Since we have

$frac{partial s(x)}{partial x} = s(x)(1 – s(x)) frac{partial s(w^Tx_i)}{partial w} = x_i(1 – s(w^Tx_i))s(w^Tx_i) frac{partial log(x)}{partial x} = frac{1}{x}$

so the derivative is

$frac{partial NLL}{partial w} = sum_{i = 1}^{n} (1 – y)frac{x_i(1 – s(w^Tx_i))s(w^Tx_i)}{1 – s(w^Tx_i)}-yfrac{x_i(1 – s(w^Tx_i))s(w^Tx_i)}{s(w^Tx_i)}$

and i checked it indeed simplifies to

$sum_{1=1}^{n} x_i(s(w^Tx_i) – y)$

Similar Posts:

Rate this post

Leave a Comment