Model Drift in Machine Learning — Data Science
Follow Quantifiers for Finance/Analytics Articles
We expect our model to perform the same as it does with training data. However if the distribution of production data is different from that of the training data, this may lead to Model Drift. Model drift refers to the decay of the model’s predictive power.
Model Drift occurs when:
- Training data is poorly sampled
- There is a change in the underlying business context
Why is important to monitor Model Drift ?
It is necessary to monitor performance for model drift to ensure accurate predictions and check if retraining is required.
Kinds of Drift
Data Drift/Feature drift — When there is a change in the input feature.
Target Drift — When there is a change in the distribution of the target variable.
Concept Drift — When there is a change between the pattern or the relationship between the predictors and the outcome.
This is also known as feature drift, population drift or covariate drift. Data drift is observed when there is a change in the distribution of features in production compared to training data. The model would still perform well on the data that is similar to the “old” one on which the model was trained.
Why does data drift occur?
- Selection bias — When the training sample is not representative of the population.
- Non-Stationary environment — When the mean and variance of data changes over time.
Examples of Data Drift:
Churn Analysis: Our model might fail to predict the reasons why there is a customer churn when a new pricing model is introduced by a competitor which we didn’t take into account while training our model.
Employee Attrition Analysis: Our model will not be able to predict employee attrition correctly if there is a change in industry demand or if there is demand for a new skill set.
Target drift is observed when there is a change in the distribution of the target variable (Dependent variable).
Examples of Target Drift:
Recommendation in e-commerce platform: Our model will not be able to provide correct recommendations if there is an introduction of new categories/products after the model was deployed to production.
House Price Prediction: Our model will not predict the price of house(target variable) correctly if there is a change in the value of the currency.
Concept drift is observed when there is a change in relationship between the target/predicted variable and the predictor variables.
Types of concept drift:
Gradual/Incremental Drift — This occurs due to gradual changes in external factors.
Examples of Gradual Concept Drift:
Competitive factors: If the competitor introduces new products are sales forecasting model will fail to predict sales correctly.
Mechanical wear/tear of equipment: Our machines become slow and they will not produce goods with the same efficiency. Hence our manufacturing model will fail.
Sudden Drift — This occurs due to sudden/unforeseen changes in external factors.
Examples of Sudden Concept Drift:
Demand for healthcare facilities during Covid-19: The model that is being used by pharmacies to predict demand for medicines failed due to increased demand for certain medicines owing to the rise of Covid-19.
E-commerce Sales prediction — Sales on Online platforms rose due to nationwide lockdown during Covid-19.
How should we deal with Model Drift?
- If there is a drift observed in the model it is important to retrain the model
- We might use both the old data as well as the new data to retrain our model. While retraining we can assign a higher weight to new data so that our model assigns priority to the latest patterns
- If we have enough new data available, we can do away with the past data.
- We might need to modify the scope of the model. We might need to run the model more frequently.
- If there is model drift observed due to internal changes it should be communicated to the data analytics team. This will ensure that business owners and model maintainers are aligned.
Machine Learning Monitoring, Part 5: Why You Should Care About Data and Concept Drift
Data drift, feature drift, population, or covariate shift. Quite a few names to describe essentially the same thing…
To connect with me reach out on Linkedin
For PM Interviews, you can refer to amazing articles at Technomanagers