I understand the importance of out-of-sample testing, but could you tell me why I should (or shouldn't) do out-of-time testing ?
The only use that comes to mind is if the predictive model applies to economic activity and seeing whether it would work in both bull and bear markets. But more insight in the use and importance(?) of out-of-time testing would be very welcome.
This post and question is a bit old, but very relevant to what I am doing at the moment.
I am in the position where I trained and scored a random forest (rf) algorithm using weather and time-of-day to predict energy consumption of a building with time-series data. At first I took a simple out-of-sample approach and just split the samples randomly to generate my training and (withheld) test set. When scoring I got extremely high R-squared values on my test set that almost matched (with in tenths of a percent in the high 90's) the rf-training scores that the training achieved on my training set. This seemed fishy – just too good to be true.
So I cut the last few weeks out of the time series and used that as a test set instead – so out-of-time testing. Sure enough my scores fell to around 0.80 and below. It made sense when I thought about it, although those original test points were withheld, they were simply too similar (close in time and weather circumstances) to the training points to be a challenge for the prediction.
So what I learned from this, is that if there is a time component in your data, then an test out-of-time is probably going to be a much better predictor of the accuracy of your algorithm when it is confronted with new data in production, as opposed to just doing random out-of-sample testing.