When working with an LSTM network in Keras. The first layer has the input_shape parameter show below.
model.add(LSTM(50, input_shape=(window_size, num_features), return_sequences=True))
I don't quite follow the window size parameter and the effect it will have on the model. As far as I understand, to make a decision the network not only makes use of current windowframe but also the information about past windows stored in the network.
So is the window size more of a tool for saving memory / computing requirements? Or does it have a big impact on a model?
Best Answer
So is the window size more of a tool for saving memory / computing requirements? Or does it have a big impact on a model?
It's both! Imagine you have a long text like War and Peace. Back-propagating from the end of the text to the beginning is a huge effort because the text is so long. Most of the effect of the update will pertain to the most recent time-steps, because the previous words in a sentence are most relevant for what you're predicting (the next word). On the other hand, imagine the most extreme truncation, which only looks back 1 time step. This won't allow the model to learn any long-term dependencies because the model focuses exclusively on the most recent time-step.
Picking a good window size is important, but fine-tuning (e.g. choosing between 64 and 65) isn't necessary — pick a window that's "large enough" to learn longer dependencies, and call it a day.
The term of art for truncating the number of time steps in a recurrent neural network is "truncated back-propagation through time."
Similar Posts:
- Solved – Stateful LSTM internal memory size
- Solved – RNN on microcontroller based systems
- Solved – Shape of the hidden state of LSTM in Keras
- Solved – When computing parameters, why is dimensions of hidden-output state of an LSTM-cell assumed same as the number of LSTM-cell
- Solved – LSTM network in the Asynchronous Advantage Actor-Critic (A3C) algorithm