Solved – Number of filters in a CNN following RGB data layer

Let us assume an image at first has three channels and is of the size 227, 227. In other words, the dimensions are (227, 227, 3)

For the layer sizes given below, how are the second dimension numbers obtained (number of channels)?

Do we apply first 32 filters to each of the RGB channels and that is why we get 96? (96=32*3) Afterwards, in the next convolution layer do we apply a constant number of filters to each of the 96 channels? If so, why do we have 256 channels if 256 is not divisible by 96?

These are the sizes of each layer in the CaffeNet/AlexNet:

data    (50, 3, 227, 227) conv1   (50, 96, 55, 55) pool1   (50, 96, 27, 27) norm1   (50, 96, 27, 27) conv2   (50, 256, 27, 27) pool2   (50, 256, 13, 13) norm2   (50, 256, 13, 13) conv3   (50, 384, 13, 13) conv4   (50, 384, 13, 13) conv5   (50, 256, 13, 13) pool5   (50, 256, 6, 6) fc6 (50, 4096) fc7 (50, 4096) fc8 (50, 1000) prob    (50, 1000) 

In a similar fashion, how do we relate the number of filters to the second dimension of the parameters blob (3,48, 256, ..)?

conv1   (96, 3, 11, 11) (96,) conv2   (256, 48, 5, 5) (256,) conv3   (384, 256, 3, 3) (384,) conv4   (384, 192, 3, 3) (384,) conv5   (256, 192, 3, 3) (256,) fc6 (4096, 9216) (4096,) fc7 (4096, 4096) (4096,) fc8 (1000, 4096) (1000,) 

If you have a 5 X 5 filter in the conv1 layer and your input layer has 3 channels, then that filter will have 5*5*3 = 75 weights ( + a bias term). So basically each filter looks at the entire depth or channels of the preceding layer.

The number of channels in the output of any layers is equal to the number of filters in that layer. Therefore the conv1 layer will have an output with 96 channels.

Similar Posts:

Rate this post

Leave a Comment