Solved – How to compute the gradient for logistic regression in Matlab

I'm trying to minimize function f, firstly I was using fminsearch but it works long time, that's why now I use fminunc, but there is one problem: I need function gradient for acceleration.

f = @(w) sum(log(1 + exp(-t .* (phis * w'))))/size(phis, 1) + coef * w*w'; options = optimset('Display', 'notify', 'MaxFunEvals', 2e+6, 'MaxIter', 2e+6); w = fminunc(f, ones(1, size(phis, 2)), options); 
  • phis size is NxN+1
  • t size is Nx1
  • coef is const

Can you help me please construct gradient for function f, coz I always get this warning:

Warning: Gradient must be provided for trust-region algorithm;   using line-search algorithm instead. 

The gradient should be (by chain rule)

%the gradient %helper function expt =  @(w)(exp(-t .* (phis * w'))); %precompute -t * phis tphis = -diag(t) * phis;  %or bsxfun(@times,t,phis); %the gradient gradf = @(w)((sum(bsxfun(@times,expt(w) ./ (1 + expt(w)), tphis),1)'/size(phis,1)) + 2*coef * w'); 

probably would be faster not to compute expt(w) twice per evaluation, so you can rewrite this in terms of another anonymous function which takes exptw as input.

also I may have goofed up the dimensions on the sum–it seems like you are using w as a row vector, which is somewhat nonstandard.

edit: as @whuber noted, this kind of thing is easy to screw up. I didn't actually try the code I had previously. the above should be correct now. To test it, I estimated the gradient numerically and compared to the 'exact' value, as below:

%set up the problem N = 9; phis = rand(N,N+1); t = rand(N,1); coef = rand(1);  %the objective f = @(w)((sum(log(1 + exp(-t .* (phis * w'))),1) / size(phis, 1)) + coef * w*w');  %helper function expt =  @(w)(exp(-t .* (phis * w'))); %precompute -t * phis tphis = -diag(t) * phis;  %or bsxfun(@times,t,phis); %the gradient gradf = @(w)((sum(bsxfun(@times,expt(w) ./ (1 + expt(w)), tphis),1)'/size(phis,1)) + 2*coef * w');  %test the code now: %compute the approximate gradient numerically w0 = randn(1,N+1); fw = f(w0);  %%the numerical: delta = 1e-6; eyeN = eye(N+1);  gfw = nan(size(w0)); for iii=1:numel(w0)     gfw(iii) = (f(w0 + delta * eyeN(iii,:)) - fw) ./ delta; end  %the 'exact': truegfw = gradf(w0);  %report fprintf('max difference between exact and numerical is %gn',max(abs(truegfw' - gfw))); 

when I run this (sorry, should have set the rand seed), I get:

max difference between exact and numerical is 4.80006e-07

YMMV, depending on the rand seed and the value of delta used.

Similar Posts:

Rate this post

Leave a Comment