I have been trying to figure out the implementation in knime. The tool says it uses Fisher scoring (FS). I understand Newton Raphson-method from http://www.win-vector.com/blog/2011/09/the-simpler-derivation-of-logistic-regression/ but can someone explain what is the exact difference between Fisher scoring and the Newton-Raphson method.

This is my knowledge of FS:

1) There is a score function $V(theta)$ which is the gradient(derivative) of the log-likelihood function. //reference wikipedia

2) For weight updates the Hessian of log-likelihood is used.

Both of the above steps are done in Newton-Raphson method also but there isn't any mention of the score function but it does take first derivative and obtain Hessian.

It is mentioned that Fisher information as the **variance of score** which is Jacobian, or **expected value of observed information** which is Hessian. So in the final equation for weight update using Fisher information I don't understand how to take expected value using Hessian. Is it something like subtracting each field with its column mean and so obtain a final matrix which multiplied with score to obtain the second part of RHS in weight update equation?

I know my understanding of the algorithm is cluttered…Can someone detail the step by step procedure for calculating the Fisher information.

**Contents**hide

#### Best Answer

The logistic regression is a generalized linear model with canonical link which means the expected information matrix (EIM) or Fisher Information is the same as the observed information matrix (OIM). The way to compute the information matrix is the inverse of the negative of the Hessian evaluated at the parameter estimates.

### Similar Posts:

- Solved – can you explicitly show me the first iteration of newton-raphson and fisher scoring
- Solved – How to use the Hessian matrix for maximum likelihood estimation
- Solved – Choosing IRLS over gradient descent in logistic regression
- Solved – Why do we make a big fuss about using Fisher scoring when we fit a GLM
- Solved – Why do we make a big fuss about using Fisher scoring when we fit a GLM