Recently I have started to implement my own Convolutional Neural Network. I have few questions. I will talk with reference to an example, so that we all remain on the same page. Suppose,
input: 64X64X1
that is gray-channel only.————Output – 64X64X1
C1: 5X5X6
that is 6 conv_maps
, each of size 5X5-Output – 60X60X6
P1: Max-Pooling – non_overlapping size = 2X2
–Output – 30X30X6
C2: 9X9X8
– 8 conv_maps
, each of size 9X9——–Output – 22X22X48//Subject_To_Change
P2: Max-Pooling – Non_overlapping size = 2X2
–Output – 11X11X48//Subject_To_Change
Ok, Now following are the questions:
ReLU
As I understand, ReLU is applied to every neuron. That is, in C1,
first time5X5
patch is moved overinput
– Then the sum of
convolution has to pass throughtransform_function
. And notransform_function
at Pooling layer. Am I correct in understanding it?Which function to use as
transfer_function
?Softplus? Noisy one? Leaky one?Also, same transfer function should be used for
FeedForward
part, right? Or can I change tosigmoid
there?- Convolution-Feature_Map Connections
How to carry out next convolution? The
P1
layer has 6 maps of30X30
. There are going to be 8 convolutional kernels, each of size 9X9. But I have NEVER seen this producing6*8
maps. Specifically,LeNet
has output of 16 maps. How to produce those maps is given in this paper on page 8. After reading it again and again I DO NOT get how to generate next feature maps. Are they doing it like this –>- Also, isn't the method mentioned in the paper specific to 'OCR'? I am very confused about how to write program for them in a user-friendly way. For e.g. if I want to see the output of different architecture, how to define these rules of connections programmatically?
I definitely did not understand "It forces a break of symmetry .." thing from the above mentioned paper. Please if you could elaborate. I am not able to visualize problem of symmetry here.
- About Bias
Initially I thought
bias
as a window of kernel size, but now I think its just a number between 0-1. But How do I add a bias? If I treat kernel as a matrix, say 5X5, then how possibly I can add a single number to matrix? We get the sum after the convolution, I think I am supposed to add the bias to this sum and then apply the transform function. Right?
Best Answer
Convolution with a kernel is done on all input maps and their summation is taken. In the input layer it is obvious since there is only one feature (input map). However, after first convolution, the later comvolutions are summuation of kernel operation on all feature maps. Hence, instead of 48 output feature map at C2, there should be 8 maps. this link explains the network and its back-prop in a clear way.
Use $ f(x) = max(0,x)$ as activation(transform) function. After successful implementation, you can use the others too. You should use the same function for both 'feedforward' and 'back-prop'.
I haven't read the paper, but breaking symmetry is about selecting weight from a random distribution. If the weights are the same on feature maps, back propagated error will be the same. As a result network learns the same filters which is not desirable.
Rule of connections are already defined as mathematical expressions. The number of kernels, number of layers, kernelsize etc. should be defined symbolically and they sould be assigned in main section of the code.
You should add bias before applying activation function. A single bias, most commonly used, is added to feature map. Summing a scalar with a matrix is simply adding the scalar at each indexes of the matrix.
If you didn't write a code for NN before, It would be better to start with it.
Similar Posts:
- Solved – How are kernels applied to feature maps to produce other feature maps
- Solved – Bottleneck building block in Residual learning networks
- Solved – How CNN reduces number of feature maps/ number of classes
- Solved – Connection between filters and feature map in CNN
- Solved – Is a 1D convolution of size $m$ with $k$ channels the same as a 2D convolution of size $m times k$ with 1 channel