It's hard to find literature where LTSM are used with multidimensional input.
I know that LTSM admits various time series as input (multidimensional input) with the shape : (samples,look back,dimension).Dimension could be, electricity demand, temperature, pressure, day of the week,… among others.
The question is: If you want to forecast the next X samples of one of the features, is it forcing you to also forecast the other "drivers"? Thereby, has the target vector be of the same dimensionality as the input one?
Then I understand that there is not any difference between the "main" magnitude to forecast and the "drivers" as you are forced to forecast all of them.
Maybe LSTM is not designed for this use and I'm confused…
Best Answer
If you want to forecast multiple time steps then yes, your output is going to have to be of the same dimension as your input. The input dimension is fixed, so if you aren't producing a prediction for every input, some of the inputs during prediction time would be missing and it would throw off your results if you just filled them in with 0s.
If you just want to make a prediction n time steps in the future and are less worried about the predictions in between, you can just train your model on predicting time t + n at every time step t and you won't have this issue, but you also won't be able to do an arbitrary number of predictions.
To do this it is just a matter of changing what your target value is during training. Instead of making your target whatever value comes next in the sequence, you could make it whatever the value is 5 steps ahead (time t + 5) for example. Other than that everything is the same.
One other idea is to have your output be a sequence also. For example if you were trying to predict how many sales a company will have in the future. Say your data is daily observations, and you want to predict what the next 7 days will be. You can have your network output 7 values at each time step, one corresponding to each day, instead of just one. This method may take away some of the power of the LSTM though. Normally each individual prediction will update the hidden state for making the next one, but with this they would all be predicted at once. It would probably come down to testing to say whether that is a problem and which method will work the best for your data.