Solved – How to analyse the effect of changing parameters in a neural network

I have built a simple classification neural network with feed-forward and backpropagation and one hidden layer (code below) and I would like to explore the effect of changing:
– the learning rate
– the number of nodes in the hidden layer, and
– the order in which the training examples are presented.

I have found some general comments here that the larger the learning rate, the quicker the network learns but then there is a danger to have an unstable solution so the learning step is too large to land on the global minimum. Also generally according to this source from this post if one has more nodes in the hidden layer than a problem of local minima is less of a problem.

However I wasn't able to find resources that would explain how to arrive at a optimal value of these three parameters without just aimlessly trying different values and hoping to land on the right one. Would anyone be able to suggest an approach?

clear % Set up parameters nInput = 4; % number of nodes in input nOutput = 1; % number of nodes in output nHiddenLayer = 7; % number of nodes in th hidden layer nTrain = 1000; % size of training set epsilon = 0.01; % learning rate   % Set up the inputs: random coefficients between -1 and 1 trainExamples = 2*rand(nInput,nTrain)-1; trainExamples(nInput,:) = ones(1,nTrain);  %set the last input to be 1  % Set up the student neurons for both hidden and the output layers S1(nHiddenLayer,nTrain) = 0; S2(nOutput,nTrain) = 0;  % The student neuron starts with random weights from both input and the hidden layers w1 = rand(nInput,nHiddenLayer); w2 = rand(nHiddenLayer+1,nOutput);  % Calculate the teacher outputs according to the quadratic formula T = sign(trainExamples(2,:).^2-4*trainExamples(1,:).*trainExamples(3,:));   % Initialise values for looping nEpochs = 0; nWrong = nTrain*0.01; Wrong = []; Epoch = [];  while(nWrong >= (nTrain*0.01)) % as long as more than 1% of outputs are wrong     for i=1:nTrain         x = trainExamples(:,i);         S1(1:nHiddenLayer,i) = w1'*x;         S2(:,i) = w2'*[tanh(S1(:,i));1];         delta1 = tanh(S2(:,i)) - T(:,i); % back propagate         delta2 = (1-tanh(S1(:,i)).^2).*(w2(1:nHiddenLayer,:)*delta1); % back propagate                w1 = w1 - epsilon*x*delta2'; % update         w2 = w2 - epsilon*[tanh(S1(:,i));1]*delta1'; % update     end      outputNN = sign(tanh(S2));     delta = outputNN - T; % difference between student and teacher     nWrong = sum(abs(delta/2));     nEpochs = nEpochs + 1;     Wrong = [Wrong nWrong];     Epoch = [Epoch nEpochs]; end plot(Epoch,Wrong); 

Well, "aimlessly trying different values" is of course not a very good strategy. The primary thing to optimize (from my experience), would be the number of hidden layers, the number of hidden neurons in the hidden layers, and the learning rate. I would try to vary one parameter at the time, and then plot the test-error and the training error, to see which learning rate makes the network converge in the best way.

Edit: As mentioned by bayerj, there are multiple misleading statements in this answer. The first thing is plotting the test-error. You should of course use a training/validation/test-split (often 80%, 10%, 10%) on your data, and then plot your validation error – not the test error. I left this out in my original answer since it was supposed to be a comment.

The other thing, is that I said that "aimlessly trying different values" was a bad strategy. What I meant here, was not that a randomized strategy for optimizing performance was a bad idea, but that is not how I read "aimlessly trying different values". If you implement a simple randomized search for good parameter-values, the article provided in the answer by bayerj shows, that this performs very well.

Similar Posts:

Rate this post

Leave a Comment