## Solved – Different notions of over-parameterization

While reading a paper, I came across the statement This prediction function will be parameterized by a parameter vector \$theta\$ in a parameter space \$Theta\$. Often, this prediction function will be over-parameterized and two parameters \$(theta, theta') in Theta^2\$ that yield the same prediction function everywhere, \$forall x in mathscr{X}, f_theta(x)=f_{theta'}(x)\$, are called observationally equivalent. … Read more

## Solved – ny explanation for the spatial batch normalization

I read this part in the paper but i didn't fully understand. "we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way" 1- what is the meaning of "convolutional property" and "normalized in the same way"? … Read more

## Solved – Softmax with log-likelihood cost

I am working on my understanding of neural networks using Michael Nielsen's "Neural networks and deep learning." Now in the third chapter, I am trying to develop an intuition of how softmax works together with a log-likelihood cost function. http://neuralnetworksanddeeplearning.com/chap3.html Nielsen defines the log-likelihood cost associated with a training input (eq. 80) as \$\$C equiv … Read more

## Solved – Differences between Multi-layer NN, Hopfield, Helmholtz and Boltzmann machines

There are several resources describing these types of networks, but none of them explicitly address the differences between them. What are the differences between theses models? Best Answer Multilayer NN (MLP) and Hopfield networks are deterministic networks. Concretely, the first can be shown to estimate the conditional average on the target data. For details you … Read more

## Solved – How to handle even and odd convolutional filter sizes and images

Is there a rule of thumb for determining the size of a convolutional filter given the shape of the input? Specifically, if you want to do a 1D convolution over an even-length vector, does the kernel need to be a divisor of the vector length? Does the kernel need to be even? I understand that … Read more

## Solved – How to handle even and odd convolutional filter sizes and images

Is there a rule of thumb for determining the size of a convolutional filter given the shape of the input? Specifically, if you want to do a 1D convolution over an even-length vector, does the kernel need to be a divisor of the vector length? Does the kernel need to be even? I understand that … Read more

## Solved – Is a neural network essential for deep learning

I received preliminary materials on deep learning in my class. It was written as follows. This raised me the question of the basic meaning of the word deep learning. Deep learning is a machine learning method using a multi-layer neural network. Is a neural network essential for deep learning? Isn't it possible to do deep … Read more

## Solved – Is a neural network essential for deep learning

I received preliminary materials on deep learning in my class. It was written as follows. This raised me the question of the basic meaning of the word deep learning. Deep learning is a machine learning method using a multi-layer neural network. Is a neural network essential for deep learning? Isn't it possible to do deep … Read more

## Solved – If softmax is used as an activation function for output layer, must the number of nodes in the last hidden layer equal the number of output nodes

Let us assume that I have the following neural network architecture: Input-Layer: 12 nodes 1st Hidden Layer: 9 nodes 2nd Hidden Layer: 6 nodes Output Layer: 3 nodes Can I use Softmax activation function on the output layer for the above architecture? If so, how? Because, in the Softmax formula how will I get the … Read more

## Solved – If softmax is used as an activation function for output layer, must the number of nodes in the last hidden layer equal the number of output nodes

Let us assume that I have the following neural network architecture: Input-Layer: 12 nodes 1st Hidden Layer: 9 nodes 2nd Hidden Layer: 6 nodes Output Layer: 3 nodes Can I use Softmax activation function on the output layer for the above architecture? If so, how? Because, in the Softmax formula how will I get the … Read more