I have this validation curve where in for each epoch I plot my validation score and training score. I use the Adam optimizer and don't attain better smoothness when I lower my learning rate, only when I really lower it it gives smoother results but then learning takes very long. Are there any other tricks to attain a smoother validation curve for the validation accuracy score?
Best Answer
The amount of "jitter" in the validation curve will depend to some extend on minibatch size; larger minibatches will result in a smoother curve, while smaller minibatches will result in more "jitter". Think about it like this: if you take it to the extremes, if you were to use your whole training set as the minibatch, you would have an extremely smooth curve, whereas if you use only one example per minibatch (stochastic gradient descent), you would get a lot of fluctuation. Using larger minibatch sizes can however under some circumstances contribute to overfitting. The learning rate as you observed will have some impact as well — smaller learning rate should correspond to a smoother curve (and also under some circumstances contribute to overfitting).
But I think it's really important to ask: why do you care how smooth your validation curve is? Achieving a smooth validation curve isn't really your main goal. Obviously, if the minibatch-to-minibatch validation accuracy score is fluctuating like crazy, then you're not converging, and you probably do need a slightly larger minibatch size or lower learning rate, but unless that's a problem I wouldn't worry too much about exactly how smooth your validation curve is. The biggest concerns are whether you are achieving convergence (and if not, I can see why that would be a concern) and the extent of overfitting.