# Solved – Neural Networks for k step ahead time series forecasting

I am looking into neural networks and had a conceptual question about time series forecasting.

Let's say I have hourly temperature measurements at a given location several for several month. My goal would be to forecast, from a time t, the expected temperature for the next k hours. Which of the following architectures would be the best/recommended/feasible?

1. The input of the neural network is n values in the past from a time t : \$[y_t,y_{t-1}, …,y_{t-n+1}]\$ and my output is k nodes representing the values in the future: \$[y_{t+1},y_{t+2},….,y_{t+k}]\$
Different n would be tested and historical data would be used to train the NN.

2. The input is the same n values in the past but this time k different neural networks would be trained each for a specific time step fro 1 to k.

1st neural network \$[y_t,y_{t-1}, …,y_{t-n+1}] => y_{t+1}\$

2nd neural network \$[y_t,y_{t-1}, …,y_{t-n+1}] => y_{t+2}\$

etc.

Each network would be trained separately on the historical data and all k networks would be used with the same input to produce \$[y_{t+1},y_{t+2},….,y_{t+k}]\$

3. A single neural network is trained to produce only 1h ahead forecast \$[y_t,y_{t-1}, …,y_{t-n+1}]=>y_{t+1}\$ To predict k values in the future, the neural network is used iteratively with the forecasted value used as an input at the next step, as such:

1st step \$[y_t,y_{t-1}, …,y_{t-n+1}] => hat{y}_{t+1}\$

2nd step \$[hat{y}_{t+1},y_t,y_{t-1}, …,y_{t-n+2}] => hat{y}_{t+2}\$

3rd step \$[hat{y}_{t+2},hat{y}_{t+1},y_t,…,y_{t-n+3}] => hat{y}_{t+3}\$
etc.

I have the feeling that the 1st method would be very hard to train because of the large number of inputs and outputs. The first hour ahead should be more correlated to the past values in time and thus easier to forecast, conversely as k becomes large the correlation between the past and future values becomes smaller and thus harder to predict. A single NN architecture combining all k hours would thus perform poorly overall as the later hours might penalize the overall behaviour.

The second architecture might compensate that as the neural networks for the first few times ahead might be performant while the later ones will not. Knowing that could be somewhat useful.

As the third architecture only uses one neural network for 1h ahead forecast. We can expect this NN to be the most performant out of the k networks from the second architecture, thus the output value could be considered correct enough to be used as the real value and used as an input for the next time step. This assumption is of course not true but perhaps for a certain number of steps k the deviation would not be too important.

That's the 3 options I have though about, are they somewhat correct or is there a fundamental logic behind Neural Networks which I haven't grasped? The literature I have found on the subject didn't go into detail on how to predict more than on step in the future.

Contents