I'm trying to minimize function f, firstly I was using fminsearch but it works long time, that's why now I use fminunc, but there is one problem: I need function gradient for acceleration.
f = @(w) sum(log(1 + exp(-t .* (phis * w'))))/size(phis, 1) + coef * w*w'; options = optimset('Display', 'notify', 'MaxFunEvals', 2e+6, 'MaxIter', 2e+6); w = fminunc(f, ones(1, size(phis, 2)), options);
- phis size is NxN+1
- t size is Nx1
- coef is const
Can you help me please construct gradient for function f, coz I always get this warning:
Warning: Gradient must be provided for trust-region algorithm; using line-search algorithm instead.
Best Answer
The gradient should be (by chain rule)
%the gradient %helper function expt = @(w)(exp(-t .* (phis * w'))); %precompute -t * phis tphis = -diag(t) * phis; %or bsxfun(@times,t,phis); %the gradient gradf = @(w)((sum(bsxfun(@times,expt(w) ./ (1 + expt(w)), tphis),1)'/size(phis,1)) + 2*coef * w');
probably would be faster not to compute expt(w)
twice per evaluation, so you can rewrite this in terms of another anonymous function which takes exptw
as input.
also I may have goofed up the dimensions on the sum–it seems like you are using w
as a row vector, which is somewhat nonstandard.
edit: as @whuber noted, this kind of thing is easy to screw up. I didn't actually try the code I had previously. the above should be correct now. To test it, I estimated the gradient numerically and compared to the 'exact' value, as below:
%set up the problem N = 9; phis = rand(N,N+1); t = rand(N,1); coef = rand(1); %the objective f = @(w)((sum(log(1 + exp(-t .* (phis * w'))),1) / size(phis, 1)) + coef * w*w'); %helper function expt = @(w)(exp(-t .* (phis * w'))); %precompute -t * phis tphis = -diag(t) * phis; %or bsxfun(@times,t,phis); %the gradient gradf = @(w)((sum(bsxfun(@times,expt(w) ./ (1 + expt(w)), tphis),1)'/size(phis,1)) + 2*coef * w'); %test the code now: %compute the approximate gradient numerically w0 = randn(1,N+1); fw = f(w0); %%the numerical: delta = 1e-6; eyeN = eye(N+1); gfw = nan(size(w0)); for iii=1:numel(w0) gfw(iii) = (f(w0 + delta * eyeN(iii,:)) - fw) ./ delta; end %the 'exact': truegfw = gradf(w0); %report fprintf('max difference between exact and numerical is %gn',max(abs(truegfw' - gfw)));
when I run this (sorry, should have set the rand seed), I get:
max difference between exact and numerical is 4.80006e-07
YMMV, depending on the rand seed and the value of delta
used.
Similar Posts:
- Solved – How to compute the gradient for logistic regression in Matlab
- Solved – How to compute the gradient for logistic regression in Matlab
- Solved – How to interpret the lme function result
- Solved – Why nowadays ML algorithm rarely use optimizing functions based on newton method
- Solved – MATLAB implementation of MLE for Logistic Regression