I have designed an autoencoder with a encoder and decoder consiting of 2D convolutational layers (the input are 40'000 2D images). I train the autoencoder using adam optimizer. The autoencoders has the following hyperparameters which I would like to tune (in brackets are my default values):
- Number of layers in encoder and decoder (I start with 2 in decoder and encoder)
- Filter size for convolutional layers (I start with 32 and 64)
- Convolutional kernel size (I start with 3×3)
- Stride size (I start with 2×2)
- Dropout (I start with 0.25 after each layer)
- Learning rate (0.001)
- learning_rate_decay (0)
- Latent dimension (I start with 8)
- Number of units in the dense layer (layer before creating latent space, I start with 16)
- Batch size (I start with 128)
One possibility would be to use just grid or random search but this is very inefficient and takes a long time with so many hyperparameters. Instead, I would like to observe the training and validation loss (using tensorboard) and adjust the parameters accordingly. For example when observing the training and validation loss there could be overfitting or underfitting (or also an increase in loss etc.).
Are there some general rules or hints how the hyperparameters could be adjusted based on the observed losses or based on other criterions?
Best Answer
As you have correctly assumed, there are simply too many parameters to perform a full grid search. Luckily, many of those have relatively small effect compared to others:
- Convolution kernel size (3×3 works universally quite well)
- Convolution stride (2×2)
- Batch size
- Dropout is not needed for convolutional layers; also, for dense layers use it only if you experience overfitting
There are some rules of thumb for setting the learning rate and its decay, a popular one is to increase the learning rate and record the loss, which should yield a curve like this one (taken from this article):
The region of the steepest loss descent hints to the optimal learning rate choice. Alternatively, optimizers like Adam are robust to the choice of the learning rate (since the amount of change is adaptive for each parameter), but they have their own parameters to tune for optimal performance.
What remains are:
- number of layers,
- number of units per layer,
- latent dimension
These should be easy to tune: start with a small network, increasing the size as long as your validation error gets better.
Similar Posts:
- Solved – Encoder Decoder networks with varying image sizes
- Solved – Learning Rates for Finetuning Convolutional Neural Networks
- Solved – Distorted validation loss when using batch normalization in convolutional autoencoder
- Solved – Autoencoder with tied weights: bias
- Solved – Variational Autoencoder – How many Normal Distributions for Posterior