Solved – How to frequently update classification model with new training data

How do I incorporate a new stream of data into my classification model? Do I have to retrain the model from the beginning every time I want to incorporate new data, or can I update the existing model with each new data point?

Imagine I have built a binary classification model (I am not concerned about which classification algorithm at this point, so let me know if you think certain algorithms are better suited for the problem I am describing) that predicts whether users pick Option A or Option B. However, over time I keep collecting new data that should be included in my existing model (for example, I obtain true labels for data predicted in the past, and want those to be included in the model). Is there a way to update the model without retraining it completely, or is that the best solution? I imagine it would be time-consuming to entirely retrain the model if data steadily comes (and this would not work for anything that relies on near real-time predictions).

Most models in python or R libraries need to be retrained from scratch. Bayesian models can, in theory, incorporate new observations sequentially, but in practice it is often the case that they are retrained (or the posteriors are approximated by analytic distributions and then used as priors in the next round of fitting. Point is, the model needs to be refit one way or another).

It might be beneficial to monitor your model's performance over time. Once performance starts to degrade, you can retrain your model with more recent data. Alternatively, you can schedule models to retrain quarterly or yearly.

EDIT: Look up on-line algorithms which learn sequentially from data, point by point.

Similar Posts:

Rate this post

Leave a Comment