Solved – What are the benefits of layer-specific learning rates

I've read about using different learning rates for different layers of neural networks instead of using the same global learning rate for each layer.

What's the need for using these different learning rates specific to each layer?

Here is a great article on this topic: Basically, it usually makes training faster. The first layers are usually good enough, so the learning rate for them could be lower, but the last layers need to be tuned for out dataset, so lr should be higher.

Similar Posts:

Rate this post

Leave a Comment