I have a question about the use of the bsts
package. In general my question is if my approach is feasible. Because my holdout MAPE is much worse than all the other approaches I have in my ensemble.
Here is my code.
library("bsts") library("ggplot2") library("reshape") # split into test and train ------------------------------------------------------ date <- as.Date("2017-06-04") horizon <- 105 model.data$DATUM <- as.Date(model.data$DATUM) xtrain <- model.data[model.data$DATUM <= date,] xtest <- model.data[model.data$DATUM > date,] # building the first model ------------------------------------------------------ ss <- list() ss <- AddSemilocalLinearTrend(ss, xtrain$ITEMS) ss <- AddSeasonal(ss,xtrain$ITEMS,nseasons = 52, season.duration = 7) # V7 is a dummy variable for the one outlier fit <- bsts(ITEMS ~ V7 , data = xtrain, seed = 100, state.specification = ss, niter = 1500) # validation -------------------------------------------------------------------- burn <- SuggestBurn(0.1,fit) fcast.holdout <- predict(fit, newdata = xtest, h = horizon, burn = burn) validation.time <- data.frame("semi.local.linear.bsts" = as.numeric(fcast.holdout$mean), "actual" = model.data[model.data$DATUM > date,"ITEMS"], "datum" = model.data[model.data$DATUM > date,"DATUM"]) a <- melt(validation.time,id.vars = c("datum")) ggplot(data = a, aes(x = datum, y = value, group = variable,color = variable))+ geom_point()+ geom_line() plot(fcast.holdout)
The data can be found here. The data are daily sales data for a retail shop. Later I want to include some dummy variables which you can also find in the example data.
For me the main questions are:
Is the seasonal part correctly defined? I have a annual seasonality in my data and also a weekly pattern. However in the validation plot I cannot find the weekly pattern.
Why do I have such high prediction intervals? Should I change the trend part?
Best Answer
Clean out the outlier instead of using a dummy variable (use tsclean()). Try AddTrig instead of AddSeasonal for there seasonal component, since your data seems to have multiple seasonalities.
What other methods are you using that are giving better results than BSTS?