Solved – LSTM network window size selection and effect

When working with an LSTM network in Keras. The first layer has the input_shape parameter show below.

model.add(LSTM(50, input_shape=(window_size, num_features), return_sequences=True))

I don't quite follow the window size parameter and the effect it will have on the model. As far as I understand, to make a decision the network not only makes use of current windowframe but also the information about past windows stored in the network.

So is the window size more of a tool for saving memory / computing requirements? Or does it have a big impact on a model?

So is the window size more of a tool for saving memory / computing requirements? Or does it have a big impact on a model?

It's both! Imagine you have a long text like War and Peace. Back-propagating from the end of the text to the beginning is a huge effort because the text is so long. Most of the effect of the update will pertain to the most recent time-steps, because the previous words in a sentence are most relevant for what you're predicting (the next word). On the other hand, imagine the most extreme truncation, which only looks back 1 time step. This won't allow the model to learn any long-term dependencies because the model focuses exclusively on the most recent time-step.

Picking a good window size is important, but fine-tuning (e.g. choosing between 64 and 65) isn't necessary — pick a window that's "large enough" to learn longer dependencies, and call it a day.

The term of art for truncating the number of time steps in a recurrent neural network is "truncated back-propagation through time."

Similar Posts:

Rate this post

Leave a Comment