It is no clear for me when to use the flatten operation for building convnets.
It is always necessary to include a flatten operation after a set of 2D convolutions (and pooling)?
For example, let us suppose these two models for binary classification. They take as input a 2D numerical matrix of 2 rows and 15 columns and has as output a vector of two positions (positive and negative).
Model 1:
model = keras.models.Sequential([ keras.Input(shape=(2,15,1)), keras.layers.Conv2D(32, kernel_size=(2, 1), activation="relu"), keras.layers.Flatten(), keras.layers.Dense(100, activation="relu"), keras.layers.Dropout(0.2), keras.layers.Dense(100, activation="relu"), keras.layers.Dropout(0.2), keras.layers.Dense(2, activation="softmax") ])
Model 2:
model = keras.models.Sequential([ keras.Input(shape=(2,15,1)), keras.layers.Conv2D(32, kernel_size=(2, 1), activation="relu"), keras.layers.Dense(100, activation="relu"), keras.layers.Dropout(0.2), keras.layers.Dense(100, activation="relu"), keras.layers.Dropout(0.2), keras.layers.Dense(2, activation="softmax") ])
What is the difference between both? Do they have the same capacity?
Best Answer
The best way to see what's going in your models (not restricted to keras) is to print the model summary. In keras/tensorflow, you can do that via model.summary()
. For the second (not flattened) one, it prints the following:
Layer (type) Output Shape Param # ================================================================= conv2d_3 (Conv2D) (None, 1, 15, 32) 96 _________________________________________________________________ dense_9 (Dense) (None, 1, 15, 100) 3300 _________________________________________________________________ dropout_6 (Dropout) (None, 1, 15, 100) 0 _________________________________________________________________ dense_10 (Dense) (None, 1, 15, 100) 10100 _________________________________________________________________ dropout_7 (Dropout) (None, 1, 15, 100) 0 _________________________________________________________________ dense_11 (Dense) (None, 1, 15, 2) 202 ================================================================= Total params: 13,698 Trainable params: 13,698 Non-trainable params: 0
So, the output has dimension 1 x 15 x 2, which is not the case for flattened version, i.e. 2. For the first one, it is:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_2 (Conv2D) (None, 1, 15, 32) 96 _________________________________________________________________ flatten_1 (Flatten) (None, 480) 0 _________________________________________________________________ dense_6 (Dense) (None, 100) 48100 _________________________________________________________________ dropout_4 (Dropout) (None, 100) 0 _________________________________________________________________ dense_7 (Dense) (None, 100) 10100 _________________________________________________________________ dropout_5 (Dropout) (None, 100) 0 _________________________________________________________________ dense_8 (Dense) (None, 2) 202 ================================================================= Total params: 58,498 Trainable params: 58,498 Non-trainable params: 0
Apparently, their capacities are different, e.g. number of parameters etc. The unflattened version carries the channel information till the end, e.g. the Dense layer is applied to all 15 channels with input size 32 (33 with bias). Since it has 100 neurons, it makes 3300 parameters. But, in the flattened version, the input vector has no channels and it has 480 (481 with bias) as its dimension, so it makes 48100 parameters at that stage.
Therefore, the two models are quite different.
Similar Posts:
- Solved – 1D CNN for time series regression without pooling layers
- Solved – 1D CNN for time series regression without pooling layers
- Solved – How to inprove SVHN result with Keras
- Solved – Number of trainable parameters in Convolution models (Keras)
- Solved – Does it make sense to use dropout in last layer of regression neural network?