# Solved – logistic regression gradient of weights

I am reading about logistic regression (from https://piazza-resources.s3.amazonaws.com/h61o5linlbb1v0/h8exwp8dmm44ok/classificationV6.pdf?AWSAccessKeyId=AKIAIEDNRLJ4AZKBW6HA&Expires=1485650876&Signature=Rd4BqBgb4hPwWUjxAyxJNfPhklU%3D) and am looking at the negative log likelihood function. They take the gradient with respect to the weights and produce the result at the bottom of page 7. I calculated this myself and can't seem to get the solution that they arrived at.

They set

\$\$NLL = – sum_{i=1}^N[(1-y)log(1-s(w^Tx_i))+y;log;(s(w^Tx_i))]\$\$

where \$s\$ is the sigmoid function \$s(x) =frac{1}{e^{-x}+1}\$

When I take \$frac{partial NNL}{partial w}\$, I get
\$\$ -sum_{i=1}^N ( ;frac{x_i(y_i-1)e^{w^Tx_i}}{e^{W^tx_i}+1} + frac{x_iy_i}{e^{W^tx_i}+1})\$\$

and not \$\$ sum_{i=1}^N (s(w^Tx_i)-y)x_i)\$\$

\$\$

I must be making a mistake since this is just a simple gradient calculation. Can anyone shed some light onto how this was computed?

Contents

It is a simple calculation but one can easily make a mistake. Since we have

\$frac{partial s(x)}{partial x} = s(x)(1 – s(x)) frac{partial s(w^Tx_i)}{partial w} = x_i(1 – s(w^Tx_i))s(w^Tx_i) frac{partial log(x)}{partial x} = frac{1}{x}\$

so the derivative is

\$frac{partial NLL}{partial w} = sum_{i = 1}^{n} (1 – y)frac{x_i(1 – s(w^Tx_i))s(w^Tx_i)}{1 – s(w^Tx_i)}-yfrac{x_i(1 – s(w^Tx_i))s(w^Tx_i)}{s(w^Tx_i)}\$

and i checked it indeed simplifies to

\$sum_{1=1}^{n} x_i(s(w^Tx_i) – y)\$

Rate this post