As a warm up with recurrent neural networks, I'm trying to predict a sine wave from another sine wave of another frequency.
My model is a simple RNN, its forward pass can be expressed as follow:
$$
begin{aligned}
r_t &= sigma(W_{in} cdot x_t + W_{rec} cdot r_{t-1}))\
z_t &= W_{out} cdot r_t
end{aligned}
$$
where $sigma$ is the sigmoïd function.
When both input the input and expected output are two sine waves of the same frequency but with (possibly) a phase shift, the model is able to properly converge to a reasonable approximation.
However, in the following case, the model converge to a local minima and predicts zero all the time:
- input: $x = sin(t)$
- expected output: $y = sin(frac{t}{2})$
Here's what the network predicts when given the full input sequence after 10 epochs of training, using mini-batches of size 16, a learning rate of 0.01, a sequence length of 16 and hidden layers of size 32:
Which leads me to think the network is unable to learn through time and relies only on the current input to make its prediction.
I tried to tune the learning rate, sequences length and hidden layers size without much success.
I'm having the exact same issue with an LSTM. I don't want to believe these architectures are that flawed, any hints on what am I doing wrong ?
I'm using an rnn package for Torch, the code is in a Gist.
Best Answer
Your data basically cannot be learned with an RNN trained that way. Your input is $sin(t)$ is $2pi$-periodic $sin(t) = sin(t+2pi)$
but your target $sin(t/2)$ is $4pi$-periodic and $sin(t/2) = -sin(t+2pi)$
Therefore, in your dataset you'll have pairs of identical inputs with opposite outputs. In terms of Mean Squared Error, it means that the optimal solution is a null function.
These are two slices of your plot where you can see identical inputs but opposite targets