I was reading MatConvNet's tutorial for (convolutional) deep learning and it said:
"…the receptive field size for the layer. This is the size (in
pixels) of the local image region that affects a particular element in
a feature map."
which makes sense with the traditional definition of a receptive field. Its usually thought as the number of pixels that affect a particular node in the feature map. However, when I went and do the exercise they have the following table:
layer| 0| 1| 2| 3| 4| 5| type|input| conv| relu| conv| relu| conv| name| n/a|conv1|relu1|conv2|relu2|prediction| ----------|-----|-----|-----|-----|-----|----------| support| n/a| 3| 1| 3| 1| 3| filt dim| n/a| 1| n/a| 32| n/a| 32| num filts| n/a| 32| n/a| 32| n/a| 1| stride| n/a| 1| 1| 1| 1| 1| pad| n/a| 1| 0| 1| 0| 1| ----------|-----|-----|-----|-----|-----|----------| rf size| n/a| 3| 3| 5| 5| 7|
where we can see that the convolution layer (layer 3) has a rf size (receptive field) of size 5. I was wondering, how did they get that number for the receptive field? I thought that the receptive field just referred to the size of the image size of the input to compute a feature map, i.e. the same size as the filter size of that convolution layer (Thought, I am aware the concept can extend to lower layers as explained on chapter 9 of Begnio, Goodfellow, Courville BGC deep learning book). Regardless, even aware of the extension definition, I am still unsure how to the number 5 was obtained on layer 3. Any ideas?
Best Answer
Receptive field refers to the pixels in the input image which contribute to a feature in any layer of a network.
Layer 1: Each point in the feature map comes from 3×3 pixels from input image, so RF is 3
Layer 3: Each point in the feature map comes from 3×3 patch of feature map from layer 1 which in turn map to 3×3 pixels of input image, if you map it back you realize it maps to 5×5 patch of the input image.
Similarly for layer 5 you get a 7×7 patch.
Similar Posts:
- Solved – What’s the receptive field of a stack of dilated convolutions
- Solved – What’s the receptive field of a stack of dilated convolutions
- Solved – Concatentation of feature maps in U-net
- Solved – Connection between filters and feature map in CNN
- Solved – minimum image size for successful deep auto-encoders