# Solved – How to best evaluate a time series prediction algorithm

What's best-practice for training and evaluating a prediction algorithm on a time series?

For learning algorithms that are trained in batch mode, a naive programmer might give the raw dataset of `[(sample, expected prediction),...]` directly to the algorithm's `train()` method. This will usually show an artificially high success rate because the algorithm will effectively be "cheating" by using future samples to inform predictions made on earlier samples. When you actually try to use the trained model to predict new data in real-time, it'll probably perform terribly, since it no longer has any future data to rely on.

My current approach is to train and evaluate as you might in real-time. For N training samples, ordered chronologically, where each sample is a tuple composed of the input A and the expected prediction output B, I input A into my algorithm and get the actual result C. I compare this to B and record the error. Then I add the sample to the local "past" subset and do a batch train a new model on just the subset. I then repeat this process for each training sample.

Or, to put it in pseudo-code:

``predictor = Predictor() training_samples = [] errors = [] for sample in sorted(all_samples, key=lambda o: o.date):     input_data, expected_prediction = sample      # Test on current test slice.     actual_prediction = predictor.predict(input_data)     errors.append(expected_prediction == actual_prediction)      # Re-train on all "past" samples relative to the current time slice.     training_samples.append(sample)     predictor = Predictor.train(training_samples) ``

This seems very thorough, since it simulates what a user would be forced to do if they had to make a prediction at each time step, but clearly, for any large dataset, would be terribly slow, since you're multiplying the algorithm's training time (which for many algorithms and large datasets is high) by every sample.

Is there a better approach?

Contents