Let us assume an image at first has three channels and is of the size 227, 227. In other words, the dimensions are (227, 227, 3)
For the layer sizes given below, how are the second dimension numbers obtained (number of channels)?
Do we apply first 32 filters to each of the RGB channels and that is why we get 96? (96=32*3) Afterwards, in the next convolution layer do we apply a constant number of filters to each of the 96 channels? If so, why do we have 256 channels if 256 is not divisible by 96?
These are the sizes of each layer in the CaffeNet/AlexNet:
data (50, 3, 227, 227) conv1 (50, 96, 55, 55) pool1 (50, 96, 27, 27) norm1 (50, 96, 27, 27) conv2 (50, 256, 27, 27) pool2 (50, 256, 13, 13) norm2 (50, 256, 13, 13) conv3 (50, 384, 13, 13) conv4 (50, 384, 13, 13) conv5 (50, 256, 13, 13) pool5 (50, 256, 6, 6) fc6 (50, 4096) fc7 (50, 4096) fc8 (50, 1000) prob (50, 1000)
In a similar fashion, how do we relate the number of filters to the second dimension of the parameters blob (3,48, 256, ..)?
conv1 (96, 3, 11, 11) (96,) conv2 (256, 48, 5, 5) (256,) conv3 (384, 256, 3, 3) (384,) conv4 (384, 192, 3, 3) (384,) conv5 (256, 192, 3, 3) (256,) fc6 (4096, 9216) (4096,) fc7 (4096, 4096) (4096,) fc8 (1000, 4096) (1000,)
Best Answer
If you have a 5 X 5 filter in the conv1 layer and your input layer has 3 channels, then that filter will have 5*5*3 = 75 weights ( + a bias term). So basically each filter looks at the entire depth or channels of the preceding layer.
The number of channels in the output of any layers is equal to the number of filters in that layer. Therefore the conv1 layer will have an output with 96 channels.
Similar Posts:
- Solved – Connection between filters and feature map in CNN
- Solved – How to train convolutional neural networks with multi-channel images
- Solved – Convolutional neural networks – What is done first? Padding or convolving
- Solved – Convolutional neural networks – What is done first? Padding or convolving
- Solved – What do the dimensions of a convolutional layer represent