I use NN for my mini project research, and I found out the newest trick for feed forward NN is using dropout for regularization instead of L1/L2 norm and rectified linear unit as an activation function.
But when I tried it, I always got worse results compared to a standard NN with sigmoid / hyperbolic tangent activation function.
Is there some rule of thumb or trick that we can use for training dropout ReLU NN?
I am posting quite late, but I wanted to provide an answer just in case someone else has this problem.
Check that you are turning off dropout when you are evaluating on the validation/test set or if you want to compute error on the training set. Dropout was designed with the express intent of reducing overfitting, so if you are evaluating training loss with dropout turned on, you may see a higher training error.
For those familiar with the Lasagne framework built on top of Theano, there is an option something like: "get_output(net, deterministic = True)" (something like this, I forget exactly) where it does a deterministic forward pass, turning off Dropout and not performing any sort of noise injection.
- Solved – dropout: forward prop VS back prop in machine learning Neural Network
- Solved – Is there something analogous to dropout for classification problems
- Solved – When implementing dropout in neural networks with SGD, how does one calculate the gradient
- Solved – Understanding dropout method: one mask per batch, or more
- Solved – Neural Network, questions on DropOut process