I am trying to see the power of recurrent neural calculations. I give the NN just one feature, a timeseries datum one step in the past, and predict a current datum. The timeseries is however double-seasonal with considerably long ACF structure (about 64) with additive shorter seasonality for lag 6.
You could note it is shifted. I checked my vectors, and they seem OK.
MSE residuals are also quite bad (I expect 0.01 on both train validation thanks to Gaussian noise added with sigma = 0.1):
> head(x_train) [1] 0.9172955 0.9285578 0.4046166 -0.4144658 -0.3121450 0.3958689 > head(y_train) [,1] [1,] 0.9285578 [2,] 0.4046166 [3,] -0.4144658 [4,] -0.3121450 [5,] 0.3958689 [6,] 1.5823631
Q: am I doing something wrong in terms of LSTM acrchitecture, or data preparation, or batching?
library(keras) library(data.table) # constants features <- 1 timesteps <- 1 x_diff <- sin(seq(0.1, 100, 0.1)) + sin(seq(1, 1000, 1)) + rnorm(1000, 0, 0.1) #x_diff <- ((x_diff - min(x_diff)) / (max(x_diff) - min(x_diff)) - 0.5) * 2 # generate training data train_list <- list() train_y_list <- list() for( i in 1:(length(x_diff) / 2 - timesteps) ) { train_list[[i]] <- x_diff[i:(timesteps + i - 1)] train_y_list[[i]] <- x_diff[timesteps + i] } x_train <- unlist(train_list) y_train <- unlist(train_y_list) x_train <- array(x_train, dim = c(length(train_list), timesteps, features)) y_train <- matrix(y_train, ncol = 1) # generate validation data val_list <- list() val_y_list <- list() for( i in (length(x_diff) / 2):(length(x_diff) - timesteps) ) { val_list[[i - length(x_diff) / 2 + 1]] <- x_diff[i:(timesteps + i - 1)] val_y_list[[i - length(x_diff) / 2 + 1]] <- x_diff[timesteps + i] } x_val <- unlist(val_list) y_val <- unlist(val_y_list) x_val <- array(x_val, dim = c(length(val_list), timesteps, features)) y_val <- matrix(y_val, ncol = 1) ## lstm (stacked) ---------------------------------------------------------- # define and compile model # expected input data shape: (batch_size, timesteps, features) rm(fx_model) fx_model <- keras_model_sequential() %>% layer_lstm( units = 32 #, return_sequences = TRUE , input_shape = c(timesteps, features) ) %>% #layer_lstm(units = 16, return_sequences = TRUE) %>% #layer_lstm(units = 16) %>% # return a single vector dimension 16 #layer_dropout(rate = 0.5) %>% layer_dense(units = 4, activation = 'tanh') %>% layer_dense(units = 1, activation = 'linear') %>% compile( loss = 'mse', optimizer = 'RMSprop', metrics = c('mse') ) # train # early_stopping <- # callback_early_stopping( # monitor = 'val_loss' # , patience = 10 # ) history <- fx_model %>% fit( x_train, y_train, batch_size = 50, epochs = 100, validation_data = list(x_val, y_val) ) plot(history) ## plot predict fx_predict <- data.table( forecast = as.numeric(predict( fx_model , x_val )) , fact = as.numeric(y_val[, 1]) , timestep = 1:length(x_diff[(length(x_diff) / 2):(length(x_diff) - timesteps)]) ) fx_predict_melt <- melt(fx_predict , id.vars = 'timestep' , measure.vars = c('fact', 'forecast') ) ggplot( fx_predict_melt[timestep < 301, ] , aes(x = timestep , y = value , group = variable , color = variable) ) + geom_line( alpha = 0.95 , size = 1 ) + ggplot_theme
Best Answer
So, after trying many input and parameter tweaks, I came to a conclusion that LSTM cannot long dependencies until it gets long enough vector of past time series values. In my experiments a so-so good quality of forecast could be obtained after feeding the net with 64 lags, which span over the seasonalities in the model.
Another thing is that minibatches are a bad idea if they were sampled randomly. In the realization of neural networks I played with I made it work with 100% of examples passed in iteration. That way I ensured that all examples come in time-wise sequences.
Also it is worth mentioning that the LSTM result compared poorly against a linear benchmarking model.
If you think I am wrong, give me good counter arguments.