I'm following tutorials of recognizing handwritten digits and object detection using CNN with Keras library from these source:
Digit Recognition, Object detection respectively.
Now in digit recognition, they have flattened the training and testing data set into 1D vector of 784
pixels from the original 2D structure of 28*28
using following code:
num_pixels = X_train.shape[1] * X_train.shape[2] X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32') X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
But in Object detection, no such flattening is done prior to training. So my question is how do we come to know when to flatten the data set and when we need not? Data set used in digit rec. is MNIST
, where as data set of obj. detection
is CIFAR-10
.
I'm newbie in Machine Learning and Neural networks. Thanks
Best Answer
That is because in the first example of digit recognition, they do not use the CNN structure. Rather, they use the simple MLP with one hidden layer with the below code.
def baseline_model(): # create model model = Sequential() model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu')) model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax')) # Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model
As a result, MLP with 784-784-10 nodes are created.
When using the CNN structure in the following example, they reshape the X data into (n, 28, 28, 1).
# load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape to be [samples][pixels][width][height] X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32') X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
In short, as there are two examples using different model structures in digit recognition example, two data shapes are employed as a result.
In general, it could be said that 1D-shaped flattened data are used as inputs for fully-connected layers (e.g., MLPs) and 2D-shaped rectangular data are used for convolutional and pooling layers.